interactive knowledge discovery over web of data

107

Upload: mehwish-alam

Post on 18-Feb-2017

214 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Page 1: Interactive Knowledge Discovery over Web of Data

Interactive Knowledge Discovery overWeb of DataEquipe Orpailleur

Supervised by Amedeo NapoliMalika Smail-Tabbone

Mehwish Alam

Mehwish Alam Interactive Knowledge Discovery over Web of Data 1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 2

1Motivation

Towards Web of Data

Query

List of Turing Award Winners

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

Towards Web of Data

Query

List of Turing Award Winners

A more specic Query

List of American Turing Award Winners

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

Towards Web of Data

Query

List of Turing Award Winners

A more specic Query

List of American Turing Award Winners

Requires Resource Integration

sbquo List of Turing Award Winners

sbquo List of American Computer Scientists

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Turing Award Washington DC

Leslie Lamport United States

Scientist English

dbobirthPlace

dbpaward

rdftype

dbocapital

dboocialLanguage

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

Google Knowledge Graph

Mehwish Alam Interactive Knowledge Discovery over Web of Data 5

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 2: Interactive Knowledge Discovery over Web of Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 2

1Motivation

Towards Web of Data

Query

List of Turing Award Winners

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

Towards Web of Data

Query

List of Turing Award Winners

A more specic Query

List of American Turing Award Winners

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

Towards Web of Data

Query

List of Turing Award Winners

A more specic Query

List of American Turing Award Winners

Requires Resource Integration

sbquo List of Turing Award Winners

sbquo List of American Computer Scientists

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Turing Award Washington DC

Leslie Lamport United States

Scientist English

dbobirthPlace

dbpaward

rdftype

dbocapital

dboocialLanguage

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

Google Knowledge Graph

Mehwish Alam Interactive Knowledge Discovery over Web of Data 5

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 3: Interactive Knowledge Discovery over Web of Data

Towards Web of Data

Query

List of Turing Award Winners

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

Towards Web of Data

Query

List of Turing Award Winners

A more specic Query

List of American Turing Award Winners

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

Towards Web of Data

Query

List of Turing Award Winners

A more specic Query

List of American Turing Award Winners

Requires Resource Integration

sbquo List of Turing Award Winners

sbquo List of American Computer Scientists

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Turing Award Washington DC

Leslie Lamport United States

Scientist English

dbobirthPlace

dbpaward

rdftype

dbocapital

dboocialLanguage

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

Google Knowledge Graph

Mehwish Alam Interactive Knowledge Discovery over Web of Data 5

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 4: Interactive Knowledge Discovery over Web of Data

Towards Web of Data

Query

List of Turing Award Winners

A more specic Query

List of American Turing Award Winners

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

Towards Web of Data

Query

List of Turing Award Winners

A more specic Query

List of American Turing Award Winners

Requires Resource Integration

sbquo List of Turing Award Winners

sbquo List of American Computer Scientists

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Turing Award Washington DC

Leslie Lamport United States

Scientist English

dbobirthPlace

dbpaward

rdftype

dbocapital

dboocialLanguage

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

Google Knowledge Graph

Mehwish Alam Interactive Knowledge Discovery over Web of Data 5

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 5: Interactive Knowledge Discovery over Web of Data

Towards Web of Data

Query

List of Turing Award Winners

A more specic Query

List of American Turing Award Winners

Requires Resource Integration

sbquo List of Turing Award Winners

sbquo List of American Computer Scientists

Mehwish Alam Interactive Knowledge Discovery over Web of Data 3

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Turing Award Washington DC

Leslie Lamport United States

Scientist English

dbobirthPlace

dbpaward

rdftype

dbocapital

dboocialLanguage

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

Google Knowledge Graph

Mehwish Alam Interactive Knowledge Discovery over Web of Data 5

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 6: Interactive Knowledge Discovery over Web of Data

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Turing Award Washington DC

Leslie Lamport United States

Scientist English

dbobirthPlace

dbpaward

rdftype

dbocapital

dboocialLanguage

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

Google Knowledge Graph

Mehwish Alam Interactive Knowledge Discovery over Web of Data 5

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 7: Interactive Knowledge Discovery over Web of Data

From Web of Documents to Web of Data

Characteristics of Web of Documents

sbquo Unstructured Data

sbquo Human Processable

sbquo Not easily processed by machines

Turing Award Washington DC

Leslie Lamport United States

Scientist English

dbobirthPlace

dbpaward

rdftype

dbocapital

dboocialLanguage

Mehwish Alam Interactive Knowledge Discovery over Web of Data 4

Google Knowledge Graph

Mehwish Alam Interactive Knowledge Discovery over Web of Data 5

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 8: Interactive Knowledge Discovery over Web of Data

Google Knowledge Graph

Mehwish Alam Interactive Knowledge Discovery over Web of Data 5

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 9: Interactive Knowledge Discovery over Web of Data

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 10: Interactive Knowledge Discovery over Web of Data

Can we perform data analysis over Web of Data

Possible Use-Cases

sbquo groups of authors working together

sbquo detect the diversity of an author

sbquo detect major area of research for an author

sbquo given a paper is it possible to retrieve similar papers published

What do we need

sbquo A strong formalism which not only allows information retrieval but alsoallows classication and knowledge discovery

sbquo A structure which- is close to the original format of the web of data- allows search and navigation- Enable interaction with the domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 6

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 11: Interactive Knowledge Discovery over Web of Data

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 12: Interactive Knowledge Discovery over Web of Data

Contributions

sbquo Building a classication structure over Web of Data- Classifying RDF Data- Creating Views through SPARQL Queries

sbquo Ways to utilize this structure- Data Completion- Data Analysis Interactive Exploration and KDD through visualization

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 7

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 13: Interactive Knowledge Discovery over Web of Data

Related Work

Our contributions are in the direction of the following works

sbquo Sewelis (Camelis 2) [Ferreacute and Hermann 2012]

sbquo Navigala [Visani et al 2011]

sbquo Relational Exploration [Rudolph 2006]

Mehwish Alam Interactive Knowledge Discovery over Web of Data 8

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 14: Interactive Knowledge Discovery over Web of Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 9

2Background

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 15: Interactive Knowledge Discovery over Web of Data

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 16: Interactive Knowledge Discovery over Web of Data

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 17: Interactive Knowledge Discovery over Web of Data

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo A formal context is a triple pG M I q- G is a set of objects- M is a set of attributes- I is a binary relation

sbquo Galois connection

A1 ldquo tm P M | g P A Ď G pg mq P Iu

B 1 ldquo tg P G | m P B Ď M pg mq P Iu

sbquo pABq is a formal concept with extent A ldquo B 1

and intent B ldquo A1

sbquo Reduced Labeling- intents are inherited from top to bottom(top-down)

- and extents are inherited from bottom totop (bottom-up)

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 10

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 18: Interactive Knowledge Discovery over Web of Data

Formal Concept Analysis

[Ganter and Wille 1999]

sbquo Object Concept for g P G - Object concept is given as pg2 g 1q- eg ptg2 g5u tm1m2uq

sbquo Attribute Concept for m P M- Attribute concept is given as pm1m2q- eg ptg1 g2 g5u tm1uq

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Concept Lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 11

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 19: Interactive Knowledge Discovery over Web of Data

Association Rules and Implications

Let pG M I q be a formal context and X Y Ď Msbquo X Ntilde Y is an association rule with

- Support σpX Ntilde Y q ldquo|X 1XY 1|

|G |

- Confidence conf pX Ntilde Y q ldquo|X 1XY 1|

|X 1|

sbquo The implication X ugraventilde Y holds i X 1 Ď Y 1

Association Rulem2 Ntilde m3

σpm2 Ntilde m3q ldquo 35 ldquo 60conf pm2 Ntilde m3q ldquo 34 ldquo 75

Implication

m3 ugraventilde m2

conf pm3 ugraventilde m2q ldquo 33 ldquo 100

m1 m2 m3

g1 ˆ

g2 ˆ ˆ

g3 ˆ ˆ

g4 ˆ ˆ

g5 ˆ ˆ ˆ

Mehwish Alam Interactive Knowledge Discovery over Web of Data 12

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 20: Interactive Knowledge Discovery over Web of Data

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 21: Interactive Knowledge Discovery over Web of Data

Complex Data

Formal Concept Analysis cannot deal with such complex data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Numeric Data

Graphs and Molecular Structure

Syntactic Tree

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 13

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 22: Interactive Knowledge Discovery over Web of Data

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 23: Interactive Knowledge Discovery over Web of Data

Pattern Structures

[Ganter and Kuznetsov 2001]sbquo A pattern structure pG pD[q δq is composed of

- G a set of objects- pD[q is the set of descriptions with similarity operation over them- δ a mapping such as δpgq P D describes object g

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5

sbquo δpg1q ldquo xr5 5s r7 7s r6 6sy

sbquo δpg2q ldquo xr6 6s r8 8s r4 4sy

sbquo ra1 b1s[ra2 b2s ldquo rminpa1 a2qmaxpb1 b2qs

sbquo δpg1q [ δpg2q ldquo xr5 6s r7 8s r4 6sy

Mehwish Alam Interactive Knowledge Discovery over Web of Data 14

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 24: Interactive Knowledge Discovery over Web of Data

Pattern Structures

The Galois connection for pG pD[q δq is dened as

sbquo The maximal description representing the similarity of a set of objects

A˝ldquo [gPAδpgq for A Ď G

sbquo The maximal set of objects sharing a given description

d˝ ldquo tg P G |d Ď δpgqu for d P pD[q

Pattern Concept

sbquo tg1 g2ulldquo xr5 6s r7 8s r4 6sy

sbquo xr5 6s r7 8s r4 6syl ldquo tg1 g2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 15

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 25: Interactive Knowledge Discovery over Web of Data

Complex Data

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

Interval Pattern Struct[Kaytoue et al 2011]

Graphs and Molecular Structure[Kuznetsov and Samokhin 2005]

Syntactic Tree [Leeuwenberg et al 2015]

Linked Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 16

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 26: Interactive Knowledge Discovery over Web of Data

Problem Statement

sbquo Linked Data follows distributed architecture

sbquo Some resources only contain RDF triples

sbquo Some resources only contain schema information

sbquo These resources belong to same domain but share only the terms

Solution

sbquo Classify RDF triples based on RDF Schema

sbquo Allow simultaneous access to RDF triples and RDF Schema

sbquo Allow user to interact with the resulting classication

Mehwish Alam Interactive Knowledge Discovery over Web of Data 17

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 27: Interactive Knowledge Discovery over Web of Data

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 18

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 28: Interactive Knowledge Discovery over Web of Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 19

3RDFPattern Structures

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 29: Interactive Knowledge Discovery over Web of Data

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Subject Object

U Y B U U Y B Y L

predicate

U = URI B = Blank Nodes L = Literal

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 30: Interactive Knowledge Discovery over Web of Data

Resource Description Framework

Denition (RDF Triple)

Given a set of URIs U blank nodes B and literals L an RDF triple is representedas t ldquo ps p oq P pUY Bq ˆ Uˆ pUY BY Lq where s is a subject p is apredicate and o is an object

Example of RDF Graph for Publications

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

To avoid confusion we will call objects in FCA as entities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 20

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 31: Interactive Knowledge Discovery over Web of Data

RDF to Entity-Descriptions

tid Subject Predicate Object Provenance

t1 s1 dcsubject o11 DBLPt2 s2 dcsubject o16 DBLPt3 s1 rdftype Publication DBLPt4 o11 rdfssubClassOf C1 ACCSt5 o12 rdfssubClassOf C2 ACCSt6 C1 rdfssubClassOf C10 ACCS

RDF triples from several resources

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

ACM Computing Classication Taxonomy |T |

Mehwish Alam Interactive Knowledge Discovery over Web of Data 21

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 32: Interactive Knowledge Discovery over Web of Data

Structured Attribute Sets

sbquo Each subject has a structured set of attributes

sbquo In our model these structured set of attributes form a taxonomy

How to nd similarity in case of Structured Attribute Sets

sbquo Scalingsbquo Intersection of anti-chains

- Extended intersection operation [Carpineto and Romano 1996]- Proposition of [Ganter and Kuznetsov 2001] for structured attributes

sbquo Range Minimum Query - An Implementation

Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam Aleksey Buzmakov Amedeo Napoli

Alibek Sailanbayev International Conference on Concept Lattice and Their Applications 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 22

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 33: Interactive Knowledge Discovery over Web of Data

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 ]J

1

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 34: Interactive Knowledge Discovery over Web of Data

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 ]J C12

1 2

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 35: Interactive Knowledge Discovery over Web of Data

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 ]J C12 C10

1 2 3

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 36: Interactive Knowledge Discovery over Web of Data

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 ]J C12 C10 C1

1 2 3 4

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 37: Interactive Knowledge Discovery over Web of Data

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 ]J C12 C10 C1 C10

1 2 3 4 5

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 38: Interactive Knowledge Discovery over Web of Data

Using Range Minimum Query for Computing LCA

Range Minimum Query (RMQ)

Given an array Range Minimum Query nds the minimal value in a sub-array ofan array of comparable objects

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 23

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 39: Interactive Knowledge Discovery over Web of Data

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22Bu

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 40: Interactive Knowledge Discovery over Web of Data

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4Result = t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 41: Interactive Knowledge Discovery over Web of Data

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (412) = 8Result = t4 8u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 42: Interactive Knowledge Discovery over Web of Data

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 43: Interactive Knowledge Discovery over Web of Data

Range Minimum Query - An Implementation

sbquo How to compute the intersection of A ldquo tC1C5C8u and B ldquo tC1C7C9u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uAY B ldquo t4A 4B 12A 18B 20A 22BuRMQ (44) = 4 RMQ (412) = 8 RMQ (1218) = 15 RMQ (1820) = 19 andRMQ (2022) = 21Result = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 24

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 44: Interactive Knowledge Discovery over Web of Data

Resulting Pattern Concept Lattice

Entities S Descriptions d

s1 pdc subject tC1C2C7uq

s2 pdc subject tC6C8C9uq

s3 pdc subject tC4C5uq

s4 pdc subject tC4C7C8uq

s5 pdc subject tC8C9uq

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

Pattern Concept lattice

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Details of Pattern Concept lattice

Mehwish Alam Interactive Knowledge Discovery over Web of Data 25

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 45: Interactive Knowledge Discovery over Web of Data

Experimentation

Dataset |G| |T | |LeavespT q| |L| tRMQ tscalingDBLP 5293 33207 33198 10134 45s 21s

Biomedical Data 63 1490 933 1725582 145s 162s

Experiments on Linked Data

Dataset entities attributes |G | |T | |LeavespT q| |L| tRMQ tscalingBK 96 5 35 626 10 840897 37 sec 42 secLO 16 7 16 224 26 1875 0043 sec 0088 sec

NT 131 3 131 140 6 128624 36 sec 68 secPO 60 16 22 1236 58 416837 49 sec 57 sec

PT 5000 49 22 4084 60 452316 50 sec 38 secPW 200 11 94 436 21 1148656 60 sec 49 secPY 74 28 36 340 53 771569 46 sec 40 secQU 2178 4 44 8212 8 783013 28 sec 30 secTZ 186 61 31 626 88 650041 58 sec 43 secVY 52 4 52 202 15 202666 59 sec 116 sec

Experiments with numerical data from Bilkent University

sbquo Formal contexts were created through inter-ordinal scalingsbquo Structured attributes sets pm1q

1Ď pm2q

1 means that m1 ď m2

sbquo Not all datasets could be processed using scaling approach ()

Mehwish Alam Interactive Knowledge Discovery over Web of Data 26

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 46: Interactive Knowledge Discovery over Web of Data

What we did so far

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 47: Interactive Knowledge Discovery over Web of Data

How to allow feedback from domain expert

Mehwish Alam Interactive Knowledge Discovery over Web of Data 27

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 48: Interactive Knowledge Discovery over Web of Data

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 49: Interactive Knowledge Discovery over Web of Data

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 50: Interactive Knowledge Discovery over Web of Data

Navigating Concept Lattice

sbquo Entities are papers and descriptions are the classes from ACCS

sbquo Consider that the analyst is not interested in C7 ie papers on QuestionAnswering

K0

K1 K2

K4 K3 K5

K8

K10 K7 K6 K9

K11

KID Extent Intent

K1 s1 s2 s4 s5 pp1 tC14uq

K2 s1 s3 s4 pp1 tC12uq

K3 s1 s4 pp1 tC7C12uq

K4 s2 s4 s5 pp1 tC8uq

K5 s3 s4 pp1 tC4uq

K6 s1 pp1 tC1C2C7uq

K8 s2 s5 pp1 tC8C9uq

K7 s4 pp1 tC4C7C8uq

K9 s3 pp1 tC4C5uq

K10 s2 pp1 tC6C8C9uq

Interactive Exploration over RDF Data using Formal Concept Analysis Mehwish Alam

Amedeo Napoli IEEE International Conference on Data Science and Advanced Analytics 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 28

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 51: Interactive Knowledge Discovery over Web of Data

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 29

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 52: Interactive Knowledge Discovery over Web of Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 30

4Creating Views over RDF-Graphs

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 53: Interactive Knowledge Discovery over Web of Data

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 54: Interactive Knowledge Discovery over Web of Data

SPARQL

PREFIX rdfslthttpwwww3org200001rdf-schemagt

PREFIX dclthttppurlorgdctermsgt

SELECT distinct title keywords author

where

paper dccreator author

paper dctitle title

paper dcsubject keywords

FILTER(

regex(STR(keywords) pattern based classification i)

|| regex(STR(keywords) unsupervised classification i))

title

paper author

keywords

dccreator

dctitle

dcsubject

Mapping micro V Ntilde U

Given U and V a mapping micro is a partialfunction micro V Ntilde U egmicro1 keywords Ntilde Classication

title1

NapoliLS97 Amedeo_Napoli

Classication

dccreator

dctitle

dcsubject

Mehwish Alam Interactive Knowledge Discovery over Web of Data 31

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 55: Interactive Knowledge Discovery over Web of Data

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 56: Interactive Knowledge Discovery over Web of Data

Lattice-Based View Access [Alam et al CLA 2014]

SPARQL query for extracting the papers on Web Crawling and RDF from DBLPSELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) || (keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

title keyword author

title1 RDF author1title2 Web Crawling author1title2 Web Search Engines author2title2 RDF author1title2 RDF author2title2 Web Crawling author1

sbquo Ev ldquo ttitleu

sbquo Av ldquo tauthor keywordu

sbquo G ldquo micropEvq ldquo ttitle1 title2 u

sbquo M1 ldquo micropAv1q ldquo tauthor1 author2 u

sbquo M2 ldquo micropAv2q ldquo tWebCrawling RDF u

Dening Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam

Amedeo Napoli International Conference on Concept Lattice and Their Applications 2014

Mehwish Alam Interactive Knowledge Discovery over Web of Data 32

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 57: Interactive Knowledge Discovery over Web of Data

Views from Dierent Perspectives

Example

SELECT title author keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY title

Example

SELECT author title keyword WHERE

paper dctitle title

paper dccreator author

paper dcsubject keyword

filter regex((keyword ldquoWeb Crawlingrdquoldquoirdquo) ||

(keyword ldquoRDFrdquoldquoirdquo))

VIEW BY author

Mehwish Alam Interactive Knowledge Discovery over Web of Data 33

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 58: Interactive Knowledge Discovery over Web of Data

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 34

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 59: Interactive Knowledge Discovery over Web of Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 35

5Completing RDFData

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 60: Interactive Knowledge Discovery over Web of Data

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x

dctermssubject

SELECT x

WHERE

x dctermssubject categoryFrenchFilm

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 61: Interactive Knowledge Discovery over Web of Data

Problem Statement

People who were born in Berlinbefore 1900

French Films

Person

x ď 1900

Berlin

rdftype

dbpbirthPlace

dbpbirthDate

SELECT x

WHERE

x rdftype dboPerson

x dbpbirthDate dbpBerlin

x dbpbirthPlace d

FILTER (d lt= 1900)

FrenchFilm

x France

Film

dctermssubject

rdftype

dbohasCountry

SELECT x

WHERE

x rdftype dbpCountry

x dbohasCountry dbpFrance

Mehwish Alam Interactive Knowledge Discovery over Web of Data 36

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 62: Interactive Knowledge Discovery over Web of Data

From RDF Triples to Formal Context

RDF triples

ltPerson1dcsubjectdbpcComputer_Scientistsgt

ltPerson1dcsubjectdbpcTuring_Award_Laureatesgt

ltPerson1dbpfielddbpComputer Sciencesgt

ltPerson1rdftypedboScientistsgt

Predicates ObjectsIndex URI Index URI

A dcsubject a dbpcComputer_Scientistsb dbpcTuring_Award_Laureates

B dbpaward c dbpTuringAwardC rdftype d dboScientistD dbpeld e dbpComputer SciencesE dbpbirthPlace f dboUnitedStates

g dboUnitedKingdom

A B C D Ea b c d e f g

Person1 ˆ ˆ ˆ ˆ ˆ ˆ

Person2 ˆ ˆ ˆ ˆ ˆ

Person3 ˆ ˆ ˆ ˆ ˆ

Person4 ˆ ˆ ˆ ˆ

Person5 ˆ ˆ ˆ ˆ

Person6 ˆ ˆ

Person7 ˆ ˆ

The formal context is built from RDF triples after scaling from DBpedia Each cross (ˆ)corresponds to a triple ltsubjectpredicateobjectgt

Mehwish Alam Interactive Knowledge Discovery over Web of Data 37

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 63: Interactive Knowledge Discovery over Web of Data

Getting denitions from implications and association rules

sbquo If X ugraventilde Y and Y ugraventilde X then X rdquo Ysbquo X rdquo Y is called a denitionsbquo If X ugraventilde Y and Y Ntilde X has high condence we may be in the presence ofincompletion in data

Rule Condence Support Meaning

d ugraventilde c 100 5 Every scientist has won a Turing Awardc ugraventilde d 100 5 Every person who has won a Turing Award is a scientiste ugraventilde cd 100 2 All the people having the eld computer science is a

Turing award winner scientistcd ugraventilde ab 100 5 All the Scientists winning Turing Award are catego-

rized as Turing Award Laureates and Computer Scientistsab Ntilde cd 71 7 71 of the persons categorized as Turing Award Laureates and

Computer Scientists are Scientists who have won Turing Award

Association rules for the running example

sbquo Example- c d ntilde a b- conf pta bu Ntilde tc duq ldquo 071

sbquo Entities Person6 and Person7 need completion in their descriptionsbquo In fact there is such a denition a b rdquo c d ie a b ugraventilde c d andc d ugraventilde a b

Mehwish Alam Interactive Knowledge Discovery over Web of Data 38

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 64: Interactive Knowledge Discovery over Web of Data

Possible Scenario [Alam et al IJCAI 2015]

ReferenceUniverse

FormalContext

MiningImplications

RankingImplications

Can a rule be adenitionX rdquo Y

Yes

CompleteData

Mining Denitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam Aleksey Buzmakov

Victor Codocedo Amedeo Napoli International Joint Conference on Articial Intelligence 2015

Mehwish Alam Interactive Knowledge Discovery over Web of Data 39

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 65: Interactive Knowledge Discovery over Web of Data

Experimentation

Dataset Cars Videogames Smartphones Countries

Dataset building conditions

Restriction dcsubject dcsubject dcsubject rdftype

dbpcSports_cars dbpcFPS dbpcSmartphones CountryPredicates rdftype rdftype rdftype rdftype

dcsubject dcsubject dcsubject dcsubject

bodyStyle cp manufacturer languagetransmission developer operativeSystem govenmentTypeassembly requirement developer leaderTypedesigner genre cpu foundingDatelayout releaseDate gdpPppRank

Front_Person_Shooters computerPlatform

Predicates used to construct each dataset

Dataset Characteristics

Dataset Cars Videogames Smartphones Countries

Subjects 529 655 363 3153 Objects 1291 3265 495 8315 Triples 12519 20146 4710 50000 Concepts 14657 31031 1232 13754Exec time [s] 1732 1714 07 5982

Dataset characteristics

Mehwish Alam Interactive Knowledge Discovery over Web of Data 40

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 66: Interactive Knowledge Discovery over Web of Data

Evaluation

sbquo No ground truth for the triples to be added to each datasetsbquo Human evaluation as the ground truth each list of implications has a 100recall

sbquo A precision of 09 indicates that 9 out of 10 implications can be transformedinto denitions

00 02 04 06 08 10

06

07

08

09

10

Recall

Pre

cisi

on

CarsSmartphonesCountriesVideogames

Mehwish Alam Interactive Knowledge Discovery over Web of Data 41

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 67: Interactive Knowledge Discovery over Web of Data

Roadmap

Building a classication structure over Web of DataClassifying RDF DataCreating Views through SPARQL Queries

Ways to utilize this structureData CompletionData Analysis Interactive Exploration and KDD through visualization

Mehwish Alam Interactive Knowledge Discovery over Web of Data 42

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 68: Interactive Knowledge Discovery over Web of Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 43

6DataAnalysis through RV-Xplorer

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 69: Interactive Knowledge Discovery over Web of Data

User Interface of RV-Xplorer (Rdf-View eXplorer)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 44

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 70: Interactive Knowledge Discovery over Web of Data

RV-Xplorer (Software Demo)

sbquo Main Topic of Research of a team

sbquo Altering Navigation Space

sbquo Navigation Across Point of Views

sbquo Hide non-interesting parts of lattice

sbquo Search Capabilities

Mehwish Alam Interactive Knowledge Discovery over Web of Data 45

main_topic_orpailleurmp4
Media File (videomp4)
altering_nav_spacemp4
Media File (videomp4)
nav_across_point_of_viewmp4
Media File (videomp4)
hide_partsmp4
Media File (videomp4)
search_concept_latticemp4
Media File (videomp4)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 71: Interactive Knowledge Discovery over Web of Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 46

7Conclusion and Perspectives

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 72: Interactive Knowledge Discovery over Web of Data

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 73: Interactive Knowledge Discovery over Web of Data

Conclusion

Building a classication structure over Web of DataClassifying RDF Data with RDF-Pattern StructuresCreating Views through SPARQL Queries - Lattice Based View Access

Ways to utilize this structureCompleting RDF-Data using Association Rule MiningData analysis and Exploration through RV-Xplorer

InteractiveKnowledgeDiscovery +Completion

Mehwish Alam Interactive Knowledge Discovery over Web of Data 47

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 74: Interactive Knowledge Discovery over Web of Data

Interactive Exploration and KDD over Web of Data

Process of Interactively Exploring RDF Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 48

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 75: Interactive Knowledge Discovery over Web of Data

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 49

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 76: Interactive Knowledge Discovery over Web of Data

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 50

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 77: Interactive Knowledge Discovery over Web of Data

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 51

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 78: Interactive Knowledge Discovery over Web of Data

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 52

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 79: Interactive Knowledge Discovery over Web of Data

Perspectives

sbquo Perform attribute exploration over these ranked rules for knowledge basecompletion [Sertkaya 2010]

sbquo How to deal with near-denitions ie X Ntilde Y and Y Ntilde X are associationrules with high condence

sbquo Consider complete schema instead of only taxonomy ie the property andsub-property relation relations between classes etc

sbquo Improve these frameworks ie RDF-Pattern Structures and HeterogeneousPattern Structures to take into account multi-relational data

sbquo Perform large-scale experiments

Mehwish Alam Interactive Knowledge Discovery over Web of Data 53

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 80: Interactive Knowledge Discovery over Web of Data

Thank you for your attention

Mehwish Alam Interactive Knowledge Discovery over Web of Data 54

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 81: Interactive Knowledge Discovery over Web of Data

Mehwish Alam Interactive Knowledge Discovery over Web of Data 55

8References

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 82: Interactive Knowledge Discovery over Web of Data

References I

Codocedo V and Napoli A (2014)A Proposition for Combining Pattern Structures and Relational ConceptAnalysisIn 12th International Conference on Formal Concept Analysis

Ferreacute S and Hermann A (2012)Reconciling faceted search and query languages for the semantic webIJMSO 7(1)3754

Ganter B and Kuznetsov S O (2001)Pattern structures and their projectionsIn Delugach H S and Stumme G editors ICCS volume 2120 of LectureNotes in Computer Science pages 129142 Springer

Ganter B and Wille R (1999)Formal Concept Analysis Mathematical FoundationsSpringer BerlinHeidelberg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 56

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 83: Interactive Knowledge Discovery over Web of Data

References II

Kaytoue M Kuznetsov S O and Napoli A (2011)Revisiting numerical pattern mining with formal concept analysisIn Proceedings of the 22nd International Joint Conference on Articial

Intelligence pages 13421347

Kuznetsov S O and Samokhin M V (2005)Learning closed sets of labeled graphs for chemical applicationsIn Kramer S and Pfahringer B editors Inductive Logic Programming15th International Conference ILP 2005 Bonn Germany August 10-13

2005 Proceedings volume 3625 of Lecture Notes in Computer Sciencepages 190208 Springer

Mehwish Alam Interactive Knowledge Discovery over Web of Data 57

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 84: Interactive Knowledge Discovery over Web of Data

References III

Leeuwenberg A Buzmakov A Toussaint Y and Napoli A (2015)Exploring pattern structures of syntactic trees for relation extractionIn Baixeries J Sacarea C and Ojeda-Aciego M editors Formal ConceptAnalysis - 13th International Conference ICFCA 2015 Nerja Spain June

23-26 2015 Proceedings volume 9113 of Lecture Notes in Computer

Science pages 153168 Springer

Rudolph S (2006)Relational exploration combining description logics and formal concept

analysis for knowledge specicationPhD thesis Dresden University of Technology

Sertkaya B (2010)A survey on how description logic ontologies benet from fcaIn Kryszkiewicz M and Obiedkov S A editors CLA volume 672 of CEURWorkshop Proceedings pages 221 CEUR-WSorg

Mehwish Alam Interactive Knowledge Discovery over Web of Data 58

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 85: Interactive Knowledge Discovery over Web of Data

References IV

Visani M Bertet K and Ogier J (2011)Navigala an original symbol classier based on navigation through a galoislatticeIJPRAI 25(4)449473

Mehwish Alam Interactive Knowledge Discovery over Web of Data 59

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 86: Interactive Knowledge Discovery over Web of Data

From FCA to Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

dbpcSports_cars dbpcLamborghini_vehicles

1964(xsddate)

350GT

dboAutomobile dbpLamborghini

dcsubject dcsubject

rdftype dbomanufacturer

dboproductionYear

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 87: Interactive Knowledge Discovery over Web of Data

Why Pattern Structures

sbquo Triples do not always contain URIs as objects

sbquo They may dierent data types and structures including dates numberscollections strings

sbquo To deal with such a data we use Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Heterogeneous context with numeric values

Mehwish Alam Interactive Knowledge Discovery over Web of Data 60

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 88: Interactive Knowledge Discovery over Web of Data

Heterogeneous Pattern Structures[Codocedo and Napoli 2014]

sbquo Let a predicate p P P- rangeppq Ď U p is a object relation- rangeppq Ď L p is a literal relation

sbquo Pattern Structures Kp ldquo pG pDp[q δpq- pDp Ďq is an ordered set of descriptions dened for the elements in rangeppq- δp maps entities g P G to their descriptions in Dp

sbquo Heterogeneous Pattern Structures pG H∆q- H ldquo

Ś

Dppp P Pq is the Cartesian product of all the descriptions sets Dp - ∆ maps an entity g P G to a tuple where each component corresponds to adescription in a set Dp

Mehwish Alam Interactive Knowledge Discovery over Web of Data 61

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 89: Interactive Knowledge Discovery over Web of Data

Heterogeneous Pattern Structures

KA KB KC KD KE KdboproductionStartYeara b c d e f g

Reventon ˆ ˆ ˆ ˆ ˆ ˆ xr2008 2008syCountach ˆ ˆ ˆ ˆ ˆ xr1974 1974sy350GT ˆ ˆ ˆ ˆ ˆ xr1963 1963sy400GT ˆ ˆ ˆ ˆ xr1965 1965syIslero ˆ ˆ ˆ ˆ xr1967 1967syVeneno ˆ ˆ xr2012 2012syAventador Roadster ˆ ˆ -

Let A1 ldquo t350GT 400GT Islerou thensbquo pA1q

˝˝

- pA1q˝ ldquo ta b c du

- ta b c du˝ ldquo tReventonCountach 350GT 400GT Islerou- pA1q

˝˝ ldquo tReventonCountach 350GT 400GT Islerou

sbquo pA1q2

- pA1q1 ldquo r1963acute 1967s and r1963acute 1967s1 ldquo A1

- pA1q2 ldquo A1

sbquo pA1q˛˛ldquo tReventonCountach 350GT 400GT Islerou X A1 ldquo A1

sbquo Intent of this heterogeneous pattern conceptpta bu tcu tdu xr1963 1967syq

ldquoAutomobiles manufactured by Lamborghini between 1963 and 1967rdquo

Mehwish Alam Interactive Knowledge Discovery over Web of Data 62

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 90: Interactive Knowledge Discovery over Web of Data

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 63

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 91: Interactive Knowledge Discovery over Web of Data

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 64

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 92: Interactive Knowledge Discovery over Web of Data

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 65

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 93: Interactive Knowledge Discovery over Web of Data

The proposition of [Carpineto and Romano 1996]

sbquo Let K ldquo pG M I q be a formal context

sbquo extended set of attributes M˚ organized in a subsumption hierarchy basedon a partial ordering ďM˚

sbquo Extended intersection operation on s1 ldquo tC1C2C7u and s2 ldquo tC6C8C9u

Entities S C1 C2 C4 C5 C6 C7 C8 C9

s1 x x xs2 x x xs3 x xs4 x x xs5 x x

J

C12

C10

C1 C2

C11

C4 C5

C15

C13

C6

C14

C7 C8 C9

Mehwish Alam Interactive Knowledge Discovery over Web of Data 66

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 94: Interactive Knowledge Discovery over Web of Data

The proposition of [Ganter and Kuznetsov 2001]

sbquo In [Ganter and Kuznetsov 2001] authors introduce the formalism of patternstructures and take as an instantiation structured attribute sets

sbquo More formally it is assumed that the attribute set pMďMq is nite andpartially ordered and that all attribute combinations that can occur must beorder ideals (downsets) of this order

sbquo Then any order ideal Idl can be described by the set of its maximalelements Idl ldquo tx |Dy P M x ď yu

sbquo Maximal elements from dierent ideals are not comparable ie the maximalelements form an antichain and conversely each antichain is the set ofmaximal elements of some order ideal

sbquo Recall that an antichain is an ordered set where elements are pairwise notcomparable

Mehwish Alam Interactive Knowledge Discovery over Web of Data 67

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 95: Interactive Knowledge Discovery over Web of Data

The proposition of [Ganter and Kuznetsov 2001]

sbquo In the associated pattern structure pG pDĎq δq the semilattice pD[q ofpatterns consists of all antichains of the ordered attribute set

sbquo For two antichains AC1 and AC2 the inmum AC1 [ AC2 consists of allmaximal elements of the order idealtm|Dac1 P AC1 Dac2 P AC2 m ď ac1 and m ď ac2u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 96: Interactive Knowledge Discovery over Web of Data

Reduction of LCS to RMQ

sbquo The computing of the LCS within a tree can be reduced to the RMQproblem ie Range Minimum Query

sbquo Given an array of numbers the RMQ problem consists in eciently ndingthe position of the minimal value in a given range (interval) of positions forthis array

Array [ 2 1 0 3 2 ]Positions 1 2 3 4 5

Computational Time

The problem of nding such a position can be solved in Opnq preprocessingcomputational time and in Op1q computational time per one query (where n isthe number of elements in the array)

Mehwish Alam Interactive Knowledge Discovery over Web of Data 68

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 97: Interactive Knowledge Discovery over Web of Data

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 98: Interactive Knowledge Discovery over Web of Data

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 4q ldquo 4t4u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 99: Interactive Knowledge Discovery over Web of Data

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquot4 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 100: Interactive Knowledge Discovery over Web of Data

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp4 18q ldquo 15t4 15u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 101: Interactive Knowledge Discovery over Web of Data

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquot4 15 u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 102: Interactive Knowledge Discovery over Web of Data

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uRMQp20 18q ldquo 19t4 15 19u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 103: Interactive Knowledge Discovery over Web of Data

Naive Approach

For A ldquo tC1C5C8u and B ldquo tC1C7C9u we have A ldquo t4 12 20u andB ldquo t4 18 22u

D [ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0 ]J C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 J C6 C13 C7 C13 C8 C13 C9 C13 C6 J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

A ldquo t4 12 20uB ldquo t4 18 22uResult = t4 8 15 19 21u ldquo t4 21u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 69

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 104: Interactive Knowledge Discovery over Web of Data

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u

B ldquo tC1C7C9u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 105: Interactive Knowledge Discovery over Web of Data

An associated scaling

sbquo Scale the antichains to the corresponding lterssbquo A lter corresponding to an antichain in a poset is the set of all elements ofthe poset that are larger than at least one element from the antichain

J

C12

C10

C1 C2

C11

C4 C5

C6

C13

C7 C8 C9

A ldquo tC1C5C8u FilpAq ldquo tC1C10C12JC5C11C8C13C6u

B ldquo tC1C7C9u FilpBq ldquo tC1C10C12JC7C9C13C6u

FilpAq X FilpBq ldquo tC1C10C12JC13C6u which yields the antichain tC1C13u

Mehwish Alam Interactive Knowledge Discovery over Web of Data 70

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 106: Interactive Knowledge Discovery over Web of Data

Complexity

Complexity of Naive Approach

The number of RMQs of consecutive elements is Op|A||B|q

Complexity of Improved Approach

The cardinality of AY B is less then |A| ` |B| hence the number of theconsecutive elements is Op|A| ` |B|q and thus the number of RMQs ofconsecutive elements is Op|A| ` |B|q

Complexity of associated scaling

The approach based on lters has a higher complexity than the approach basedon RMQ The size of a lter is Op|T |q and thus the computational complexityof intersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

Mehwish Alam Interactive Knowledge Discovery over Web of Data 71

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures
Page 107: Interactive Knowledge Discovery over Web of Data

About complexity of the approach

sbquo The approach based on lters has a higher complexity than the approachbased on RMQThe size of a lter is Op|T |q and thus the computational complexity ofintersecting two antichains by means of a scaling is Op|T |q which is harderthan Op|LeavespT q|q for intersecting antichains directly

sbquo The intersection of antichains in arbitrary posets can be reduced to theintersection of antichains in a tree

sbquo When a poset is reduced to a tree some subsumption relations may be lostand thus should be added to the reduction

sbquo But using such a reduction is still more computationally ecient thancomputing the intersection of antichains in a poset by means of a scaling

Mehwish Alam Interactive Knowledge Discovery over Web of Data 72

  • Motivation
  • Background
    • Fundamentals of Formal Concept Analysis
    • Pattern Structures
      • RDF Pattern Structures
      • Creating Views over RDF-Graphs
        • View By
          • Completing RDF Data
            • Motivation
            • Methodology
            • Experimentation amp Evaluation
              • Data Analysis through RV-Xplorer
              • Conclusion and Perspectives
              • References
                • Heterogeneous Pattern Structures
                • RDF Pattern Structures