interactive knowledge discovery over web of data
TRANSCRIPT
Interactive Knowledge Discovery overWeb of DataEquipe Orpailleur
Supervised by Amedeo NapoliMalika Smail-Tabbone
Mehwish Alam
Mehwish Alam Interactive Knowledge Discovery over Web of Data 1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 2
1Motivation
Towards Web of Data
Query
List of Turing Award Winners
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
Towards Web of Data
Query
List of Turing Award Winners
A more specic Query
List of American Turing Award Winners
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
Towards Web of Data
Query
List of Turing Award Winners
A more specic Query
List of American Turing Award Winners
Requires Resource Integration
sbquo List of Turing Award Winners
sbquo List of American Computer Scientists
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Turing Award Washington DC
Leslie Lamport United States
Scientist English
dbobirthPlace
dbpaward
rdftype
dbocapital
dboocialLanguage
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
Google Knowledge Graph
Mehwish Alam Interactive Knowledge Discovery over Web of Data 5
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Mehwish Alam Interactive Knowledge Discovery over Web of Data 2
1Motivation
Towards Web of Data
Query
List of Turing Award Winners
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
Towards Web of Data
Query
List of Turing Award Winners
A more specic Query
List of American Turing Award Winners
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
Towards Web of Data
Query
List of Turing Award Winners
A more specic Query
List of American Turing Award Winners
Requires Resource Integration
sbquo List of Turing Award Winners
sbquo List of American Computer Scientists
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Turing Award Washington DC
Leslie Lamport United States
Scientist English
dbobirthPlace
dbpaward
rdftype
dbocapital
dboocialLanguage
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
Google Knowledge Graph
Mehwish Alam Interactive Knowledge Discovery over Web of Data 5
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Towards Web of Data
Query
List of Turing Award Winners
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
Towards Web of Data
Query
List of Turing Award Winners
A more specic Query
List of American Turing Award Winners
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
Towards Web of Data
Query
List of Turing Award Winners
A more specic Query
List of American Turing Award Winners
Requires Resource Integration
sbquo List of Turing Award Winners
sbquo List of American Computer Scientists
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Turing Award Washington DC
Leslie Lamport United States
Scientist English
dbobirthPlace
dbpaward
rdftype
dbocapital
dboocialLanguage
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
Google Knowledge Graph
Mehwish Alam Interactive Knowledge Discovery over Web of Data 5
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Towards Web of Data
Query
List of Turing Award Winners
A more specic Query
List of American Turing Award Winners
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
Towards Web of Data
Query
List of Turing Award Winners
A more specic Query
List of American Turing Award Winners
Requires Resource Integration
sbquo List of Turing Award Winners
sbquo List of American Computer Scientists
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Turing Award Washington DC
Leslie Lamport United States
Scientist English
dbobirthPlace
dbpaward
rdftype
dbocapital
dboocialLanguage
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
Google Knowledge Graph
Mehwish Alam Interactive Knowledge Discovery over Web of Data 5
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Towards Web of Data
Query
List of Turing Award Winners
A more specic Query
List of American Turing Award Winners
Requires Resource Integration
sbquo List of Turing Award Winners
sbquo List of American Computer Scientists
Mehwish Alam Interactive Knowledge Discovery over Web of Data 3
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Turing Award Washington DC
Leslie Lamport United States
Scientist English
dbobirthPlace
dbpaward
rdftype
dbocapital
dboocialLanguage
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
Google Knowledge Graph
Mehwish Alam Interactive Knowledge Discovery over Web of Data 5
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Turing Award Washington DC
Leslie Lamport United States
Scientist English
dbobirthPlace
dbpaward
rdftype
dbocapital
dboocialLanguage
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
Google Knowledge Graph
Mehwish Alam Interactive Knowledge Discovery over Web of Data 5
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
From Web of Documents to Web of Data
Characteristics of Web of Documents
sbquo Unstructured Data
sbquo Human Processable
sbquo Not easily processed by machines
Turing Award Washington DC
Leslie Lamport United States
Scientist English
dbobirthPlace
dbpaward
rdftype
dbocapital
dboocialLanguage
Mehwish Alam Interactive Knowledge Discovery over Web of Data 4
Google Knowledge Graph
Mehwish Alam Interactive Knowledge Discovery over Web of Data 5
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Google Knowledge Graph
Mehwish Alam Interactive Knowledge Discovery over Web of Data 5
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Can we perform data analysis over Web of Data
Possible Use-Cases
sbquo groups of authors working together
sbquo detect the diversity of an author
sbquo detect major area of research for an author
sbquo given a paper is it possible to retrieve similar papers published
What do we need
sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery
sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 6
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Contributions
sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries
sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 7
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Related Work
Our contributions are in the direction of the following works
sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]
sbquo Navigala [Visani et al 2011]
sbquo Relational Exploration [Rudolph 2006]
Mehwish Alam Interactive Knowledge Discovery over Web of Data 8
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Mehwish Alam Interactive Knowledge Discovery over Web of Data 9
2Background
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation
sbquo Galois connection
A1 ldquo tm P M | g P A Ď G pg mq P Iu
B 1 ldquo tg P G | m P B Ď M pg mq P Iu
sbquo pABq is a formal concept with extent A ldquo B 1
and intent B ldquo A1
sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)
- and extents are inherited from bottom totop (bottom-up)
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 10
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Formal Concept Analysis
[Ganter and Wille 1999]
sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq
sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Concept Lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 11
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Association Rules and Implications
Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with
- Support σpX Ntilde Y q ldquo|X 1XY 1|
|G |
- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|
|X 1|
sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1
Association Rulem2 Ntilde m3
σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75
Implication
m3 ugraventilde m2
conf pm3 ugraventilde m2q ldquo 33 ldquo 100
m1 m2 m3
g1 ˆ
g2 ˆ ˆ
g3 ˆ ˆ
g4 ˆ ˆ
g5 ˆ ˆ ˆ
Mehwish Alam Interactive Knowledge Discovery over Web of Data 12
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Complex Data
Formal Concept Analysis cannot deal with such complex data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Numeric Data
Graphs and Molecular Structure
Syntactic Tree
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 13
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Pattern Structures
[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of
- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5
sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy
sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy
sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs
sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy
Mehwish Alam Interactive Knowledge Discovery over Web of Data 14
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Pattern Structures
The Galois connection for pG pD[q δq is dened as
sbquo The maximal description representing the similarity of a set of objects
A˝ldquo [gPAδpgq for A Ď G
sbquo The maximal set of objects sharing a given description
d˝ ldquo tg P G |d Ď δpgqu for d P pD[q
Pattern Concept
sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy
sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 15
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Complex Data
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Interval Pattern Struct[Kaytoue et al 2011]
Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]
Syntactic Tree [Leeuwenberg et al 2015]
Linked Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 16
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Problem Statement
sbquo Linked Data follows distributed architecture
sbquo Some resources only contain RDF triples
sbquo Some resources only contain schema information
sbquo These resources belong to same domain but share only the terms
Solution
sbquo Classify RDF triples based on RDF Schema
sbquo Allow simultaneous access to RDF triples and RDF Schema
sbquo Allow user to interact with the resulting classication
Mehwish Alam Interactive Knowledge Discovery over Web of Data 17
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 18
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Mehwish Alam Interactive Knowledge Discovery over Web of Data 19
3RDFPattern Structures
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Subject Object
U Y B U U Y B Y L
predicate
U = URI B = Blank Nodes L = Literal
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Resource Description Framework
Denition (RDF Triple)
Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object
Example of RDF Graph for Publications
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
To avoid confusion we will call objects in FCA as entities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 20
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
RDF to Entity-Descriptions
tid Subject Predicate Object Provenance
t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS
RDF triples from several resources
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
ACM Computing Classication Taxonomy |T |
Mehwish Alam Interactive Knowledge Discovery over Web of Data 21
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Structured Attribute Sets
sbquo Each subject has a structured set of attributes
sbquo In our model these structured set of attributes form a taxonomy
How to nd similarity in case of Structured Attribute Sets
sbquo Scalingsbquo Intersection of anti-chains
- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes
sbquo Range Minimum Query - An Implementation
Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli
Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 22
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 ]J
1
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 ]J C12
1 2
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 ]J C12 C10
1 2 3
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 ]J C12 C10 C1
1 2 3 4
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 ]J C12 C10 C1 C10
1 2 3 4 5
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Using Range Minimum Query for Computing LCA
Range Minimum Query (RMQ)
Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 23
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Range Minimum Query - An Implementation
sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 24
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Resulting Pattern Concept Lattice
Entities S Descriptions d
s1 pdc subject tC1C2C7uq
s2 pdc subject tC6C8C9uq
s3 pdc subject tC4C5uq
s4 pdc subject tC4C7C8uq
s5 pdc subject tC8C9uq
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
Pattern Concept lattice
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Details of Pattern Concept lattice
Mehwish Alam Interactive Knowledge Discovery over Web of Data 25
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Experimentation
Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s
Biomedical Data 63 1490 933 1725582 145s 162s
Experiments on Linked Data
Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec
NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec
PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec
Experiments with numerical data from Bilkent University
sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q
1Ď pm2q
1 means that m1 ď m2
sbquo Not all datasets could be processed using scaling approach ()
Mehwish Alam Interactive Knowledge Discovery over Web of Data 26
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
What we did so far
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
How to allow feedback from domain expert
Mehwish Alam Interactive Knowledge Discovery over Web of Data 27
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Navigating Concept Lattice
sbquo Entities are papers and descriptions are the classes from ACCS
sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering
K0
K1 K2
K4 K3 K5
K8
K10 K7 K6 K9
K11
KID Extent Intent
K1 s1 s2 s4 s5 pp1 tC14uq
K2 s1 s3 s4 pp1 tC12uq
K3 s1 s4 pp1 tC7C12uq
K4 s2 s4 s5 pp1 tC8uq
K5 s3 s4 pp1 tC4uq
K6 s1 pp1 tC1C2C7uq
K8 s2 s5 pp1 tC8C9uq
K7 s4 pp1 tC4C7C8uq
K9 s3 pp1 tC4C5uq
K10 s2 pp1 tC6C8C9uq
Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam
Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 28
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 29
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Mehwish Alam Interactive Knowledge Discovery over Web of Data 30
4Creating Views over RDF-Graphs
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
SPARQL
PREFIX rdfslthttpwwww3org200001rdf-schemagt
PREFIX dclthttppurlorgdctermsgt
SELECT distinct title keywords author
where
paper dccreator author
paper dctitle title
paper dcsubject keywords
FILTER(
regex(STR(keywords) pattern based classification i)
|| regex(STR(keywords) unsupervised classification i))
title
paper author
keywords
dccreator
dctitle
dcsubject
Mapping micro V Ntilde U
Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication
title1
NapoliLS97 Amedeo_Napoli
Classication
dccreator
dctitle
dcsubject
Mehwish Alam Interactive Knowledge Discovery over Web of Data 31
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Lattice-Based View Access [Alam et al CLA 2014]
SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
title keyword author
title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1
sbquo Ev ldquo ttitleu
sbquo Av ldquo tauthor keywordu
sbquo G ldquo micropEvq ldquo ttitle1 title2 u
sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u
sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u
Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam
Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014
Mehwish Alam Interactive Knowledge Discovery over Web of Data 32
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Views from Dierent Perspectives
Example
SELECT title author keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY title
Example
SELECT author title keyword WHERE
paper dctitle title
paper dccreator author
paper dcsubject keyword
filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||
(keyword ldquoRDFrdquoldquoirdquo))
VIEW BY author
Mehwish Alam Interactive Knowledge Discovery over Web of Data 33
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 34
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Mehwish Alam Interactive Knowledge Discovery over Web of Data 35
5Completing RDFData
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x
dctermssubject
SELECT x
WHERE
x dctermssubject categoryFrenchFilm
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Problem Statement
People who were born in Berlinbefore 1900
French Films
Person
x ď 1900
Berlin
rdftype
dbpbirthPlace
dbpbirthDate
SELECT x
WHERE
x rdftype dboPerson
x dbpbirthDate dbpBerlin
x dbpbirthPlace d
FILTER (d lt= 1900)
FrenchFilm
x France
Film
dctermssubject
rdftype
dbohasCountry
SELECT x
WHERE
x rdftype dbpCountry
x dbohasCountry dbpFrance
Mehwish Alam Interactive Knowledge Discovery over Web of Data 36
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
From RDF Triples to Formal Context
RDF triples
ltPerson1dcsubjectdbpcComputer_Scientistsgt
ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt
ltPerson1dbpfielddbpComputer Sciencesgt
ltPerson1rdftypedboScientistsgt
Predicates ObjectsIndex URI Index URI
A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates
B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates
g dboUnitedKingdom
A B C D Ea b c d e f g
Person1 ˆ ˆ ˆ ˆ ˆ ˆ
Person2 ˆ ˆ ˆ ˆ ˆ
Person3 ˆ ˆ ˆ ˆ ˆ
Person4 ˆ ˆ ˆ ˆ
Person5 ˆ ˆ ˆ ˆ
Person6 ˆ ˆ
Person7 ˆ ˆ
The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt
Mehwish Alam Interactive Knowledge Discovery over Web of Data 37
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Getting denitions from implications and association rules
sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data
Rule Condence Support Meaning
d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a
Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-
rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and
Computer Scientists are Scientists who have won Turing Award
Association rules for the running example
sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071
sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b
Mehwish Alam Interactive Knowledge Discovery over Web of Data 38
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Possible Scenario [Alam et al IJCAI 2015]
ReferenceUniverse
FormalContext
MiningImplications
RankingImplications
Can a rule be adenitionX rdquo Y
Yes
CompleteData
Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov
Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015
Mehwish Alam Interactive Knowledge Discovery over Web of Data 39
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Experimentation
Dataset Cars Videogames Smartphones Countries
Dataset building conditions
Restriction dcsubject dcsubject dcsubject rdftype
dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype
dcsubject dcsubject dcsubject dcsubject
bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank
Front_Person_Shooters computerPlatform
Predicates used to construct each dataset
Dataset Characteristics
Dataset Cars Videogames Smartphones Countries
Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982
Dataset characteristics
Mehwish Alam Interactive Knowledge Discovery over Web of Data 40
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Evaluation
sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall
sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions
00 02 04 06 08 10
06
07
08
09
10
Recall
Pre
cisi
on
CarsSmartphonesCountriesVideogames
Mehwish Alam Interactive Knowledge Discovery over Web of Data 41
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Roadmap
Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries
Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization
Mehwish Alam Interactive Knowledge Discovery over Web of Data 42
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Mehwish Alam Interactive Knowledge Discovery over Web of Data 43
6DataAnalysis through RV-Xplorer
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
User Interface of RV-Xplorer (Rdf-View eXplorer)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 44
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
RV-Xplorer (Software Demo)
sbquo Main Topic of Research of a team
sbquo Altering Navigation Space
sbquo Navigation Across Point of Views
sbquo Hide non-interesting parts of lattice
sbquo Search Capabilities
Mehwish Alam Interactive Knowledge Discovery over Web of Data 45
Mehwish Alam Interactive Knowledge Discovery over Web of Data 46
7Conclusion and Perspectives
Conclusion
Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access
Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 47
Conclusion
Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access
Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 47
Interactive Exploration and KDD over Web of Data
Process of Interactively Exploring RDF Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 48
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 49
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 50
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 51
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 52
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 53
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Conclusion
Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access
Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 47
Conclusion
Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access
Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 47
Interactive Exploration and KDD over Web of Data
Process of Interactively Exploring RDF Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 48
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 49
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 50
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 51
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 52
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 53
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Conclusion
Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access
Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer
InteractiveKnowledgeDiscovery +Completion
Mehwish Alam Interactive Knowledge Discovery over Web of Data 47
Interactive Exploration and KDD over Web of Data
Process of Interactively Exploring RDF Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 48
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 49
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 50
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 51
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 52
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 53
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Interactive Exploration and KDD over Web of Data
Process of Interactively Exploring RDF Data
Mehwish Alam Interactive Knowledge Discovery over Web of Data 48
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 49
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 50
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 51
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 52
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 53
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 49
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 50
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 51
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 52
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 53
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 50
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 51
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 52
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 53
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 51
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 52
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 53
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 52
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 53
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Perspectives
sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]
sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence
sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc
sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data
sbquo Perform large-scale experiments
Mehwish Alam Interactive Knowledge Discovery over Web of Data 53
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Thank you for your attention
Mehwish Alam Interactive Knowledge Discovery over Web of Data 54
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Mehwish Alam Interactive Knowledge Discovery over Web of Data 55
8References
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
References I
Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis
Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754
Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer
Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 56
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
References II
Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial
Intelligence pages 13421347
Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13
2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer
Mehwish Alam Interactive Knowledge Discovery over Web of Data 57
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
References III
Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June
23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer
Science pages 153168 Springer
Rudolph S (2006)Relational exploration combining description logics and formal concept
analysis for knowledge specicationPhD thesis Dresden University of Technology
Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg
Mehwish Alam Interactive Knowledge Discovery over Web of Data 58
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
References IV
Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473
Mehwish Alam Interactive Knowledge Discovery over Web of Data 59
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
From FCA to Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
dbpcSports_cars dbpcLamborghini_vehicles
1964(xsddate)
350GT
dboAutomobile dbpLamborghini
dcsubject dcsubject
rdftype dbomanufacturer
dboproductionYear
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Why Pattern Structures
sbquo Triples do not always contain URIs as objects
sbquo They may dierent data types and structures including dates numberscollections strings
sbquo To deal with such a data we use Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Heterogeneous context with numeric values
Mehwish Alam Interactive Knowledge Discovery over Web of Data 60
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Heterogeneous Pattern Structures[Codocedo and Napoli 2014]
sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation
sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp
sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo
Ś
Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp
Mehwish Alam Interactive Knowledge Discovery over Web of Data 61
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Heterogeneous Pattern Structures
KA KB KC KD KE KdboproductionStartYeara b c d e f g
Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -
Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q
˝˝
- pA1q˝ ldquo ta b c du
- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q
˝˝ ldquo tReventonCountach 350GT 400GT Islerou
sbquo pA1q2
- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1
- pA1q2 ldquo A1
sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1
sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq
ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo
Mehwish Alam Interactive Knowledge Discovery over Web of Data 62
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 63
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 64
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 65
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
The proposition of [Carpineto and Romano 1996]
sbquo Let K ldquo pG M I q be a formal context
sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚
sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u
Entities S C1 C2 C4 C5 C6 C7 C8 C9
s1 x x xs2 x x xs3 x xs4 x x xs5 x x
J
C12
C10
C1 C2
C11
C4 C5
C15
C13
C6
C14
C7 C8 C9
Mehwish Alam Interactive Knowledge Discovery over Web of Data 66
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
The proposition of [Ganter and Kuznetsov 2001]
sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets
sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order
sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu
sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal
sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable
Mehwish Alam Interactive Knowledge Discovery over Web of Data 67
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
The proposition of [Ganter and Kuznetsov 2001]
sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set
sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Reduction of LCS to RMQ
sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query
sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array
Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5
Computational Time
The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)
Mehwish Alam Interactive Knowledge Discovery over Web of Data 68
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Naive Approach
For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u
D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 69
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u
B ldquo tC1C7C9u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
An associated scaling
sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain
J
C12
C10
C1 C2
C11
C4 C5
C6
C13
C7 C8 C9
A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u
B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u
FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u
Mehwish Alam Interactive Knowledge Discovery over Web of Data 70
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
Complexity
Complexity of Naive Approach
The number of RMQs of consecutive elements is Op|A||B|q
Complexity of Improved Approach
The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q
Complexity of associated scaling
The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
Mehwish Alam Interactive Knowledge Discovery over Web of Data 71
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-
About complexity of the approach
sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly
sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree
sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction
sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling
Mehwish Alam Interactive Knowledge Discovery over Web of Data 72
- Motivation
- Background
-
- Fundamentals of Formal Concept Analysis
- Pattern Structures
-
- RDF Pattern Structures
- Creating Views over RDF-Graphs
-
- View By
-
- Completing RDF Data
-
- Motivation
- Methodology
- Experimentation amp Evaluation
-
- Data Analysis through RV-Xplorer
- Conclusion and Perspectives
- References
-
- Heterogeneous Pattern Structures
- RDF Pattern Structures
-