web scale reasoning and the larkc project

26
Web Scale Reasoning and the LarKC Project (Introduction) Luka Bradeško Cycorp Europe

Upload: saltlux-inc

Post on 18-Nov-2014

1.274 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Web Scale Reasoning and the LarKC Project

Web Scale Reasoning and the LarKC Project(Introduction)

Luka BradeškoCycorp Europe

Page 2: Web Scale Reasoning and the LarKC Project

Goals of LarKC

LarKC = Large Knowledge Collider

• Build an integrated pluggable platform

for large scale reasoning

• Support for parallelization,

distribution, remote execution, data

2

2

“Significant progress is sometimes made

not by making something possible that was impossible

before, but by substantially lowering the costs

of something that was only possible before at high cost”

distribution, remote execution, data

storage

• Use existing plug-ins, develop new

• Easy integration of components

• Enables low cost experimentation

Page 3: Web Scale Reasoning and the LarKC Project

Overall approach of LarKC

• Very lightweight platform

– communication, synchronisation, registration

– LarKC = “SPARQL endpoint on steroids”

• The real work happens in the plugins

• LarKC gives you:

– very scalable datalayer

3

– very scalable datalayer

– standardised interfaces for combining components

– utilities & infrastructure to abstract from remote deployment

• Three types of LarKC users:

– people building plugins

– people configuring workflows

– people using workflows

Page 4: Web Scale Reasoning and the LarKC Project

Workflow

Support

System

Plug-in

RegistryDecider

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

SPARQL Endpoint, Application

Platform Utility

Functionality

APIs

Plug-ins

LarKC Architecture

Data Layer API

RDF

Store

RDF

Store

RDF

Store

RDF

Doc

RDF

Doc

RDF

Doc

Data Layer

Query

Transformer

Plug-in API

Identifier

Plug-in API

Info. Set

Transformer

Plug-in API

Selecter

Plug-in API

Reasoner

Plug-in API

External

systems

External data

sources

4

Page 5: Web Scale Reasoning and the LarKC Project

LarKC Plug-in API

+ Collection<InformationSet>

identify

(Query theQuery, Contract

contract, Context context)

Identifier

+ Set<Query>

transform(Query theQuery,

Contract theContract,

Context theContext)

QueryTransformer

+ InformationSet

transform(InformationSet

theInformationSet, Contract

theContract, Context

theContext)

InformationSetTransformer

+ SetOfStatements

select(SetOfStatements

theSetOfStatements,

Contract contract, Context

context)

Selecter

+ VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements

theSetOfStatements, Contract contract, Context context)

Reasoner

• 5 types of plug-ins

• Plug-in API enables interoperability (between plug-in

and platform and between plug-ins)

• Plug-ins I/O abstract data structures of RDF triples =>

flexibility for assembling plug-ins and for plug-in writers

5

+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements

theSetOfStatements, Contract contract, Context context)

+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements

theSetOfStatements, Contract contract, Context context)

+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery,

SetOfStatements theSetOfStatements, Contract contract, Context context)

+ VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters

theQoSParameters)

+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters

theQoSParameters)

+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters

theQoSParameters)

+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters

theQoSParameters)

Decider

flexibility for assembling plug-ins and for plug-in writers

• Compatibility ensured by DECIDER and workflow

configurators, based on plug-in description

Page 6: Web Scale Reasoning and the LarKC Project

Decider

What does a workflow look like?

Decider

Plug-in APIPlug-in API

Plug-in Manager

Query

Transformer

Plug-in API

Plug-in Manager

Identifier

Plug-in API

Plug-in Manager

Info. Set

Transformer

Plug-in API

Plug-in Manager

Selecter

Plug-in API

Plug-in Manager

Reasoner

Plug-in API

Plug-in

Registry

Workflow

Support

System

RDF

Store

IdentifierInfo Set

TransformerReasonerSelecter

Query

Transformer

6

Page 7: Web Scale Reasoning and the LarKC Project

What does a workflow look like?

Decider

Info Set

Transformer

Identifier

Identifier

Decider

Plug-in APIPlug-in API

Plug-in Manager

Query

Transformer

Plug-in API

Plug-in Manager

Identifier

Plug-in API

Plug-in Manager

Info. Set

Transformer

Plug-in API

Plug-in Manager

Selecter

Plug-in API

Plug-in Manager

Reasoner

Plug-in API

Plug-in

Registry

Workflow

Support

System

RDF

Store

IdentifierInfo Set

TransformerReasonerSelecter

Query

Transformer

Data Layer Data Layer Data Layer Data Layer

7

Page 8: Web Scale Reasoning and the LarKC Project

Decider Using Plug-in Registry to Create Workflow

Q

TI

S R

VB

B

D 1.3.1

Represent Properties

• Functional

• Non-functional (e.g. QoS)

• WSMO-Lite Syntax

Represent Properties

• Functional

• Non-functional (e.g. QoS)

• WSMO-Lite Syntax

Q

T

I

S R

VB

A

VB

Logical Representation

• Describes role

• Describes Inputs/Outputs

• Automatically extracted using API

• Decider can use for dynamic configuration

• Rule-based

• Fast

Logical Representation

• Describes role

• Describes Inputs/Outputs

• Automatically extracted using API

• Decider can use for dynamic configuration

• Rule-based

• Fast

8

Page 9: Web Scale Reasoning and the LarKC Project

LarKC Plug-in Managers

Plug-in Manager

Query

Transformer

Plug-in APIPlug-in API

Plug-in Manager

Identifier

Plug-in APIPlug-in API

• Run in separate threads

• Automatically add meta-data to registry when loaded

• Communicate RDF data by value or by reference

• ParallelisationPlug-in Manager

Transformer

Plug-in APIPlug-in API

Plug-in Manager

Identifier

Plug-in APIPlug-in API

ransformerTransformer

Transformer

Plug-in Manager

Identifier

Plug-in APIPlug-in APIPlug-in Manager

Identifier

Plug-in APIPlug-in API

Plug-in Manager

Selector

Plug-in API

Plug-in Manager

Selector

Plug-in API

• Split/Join connectors

in progress

9

Page 10: Web Scale Reasoning and the LarKC Project

Example workflow

Query Identify

Transform

GATE

Internet

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle

"http://shodan.ijs.si/article.txt" .

?company cyc:isa cyc:PubliclyHeldCorporation }

Select

ReasonResult

Research

Cyc

GATE

Page 11: Web Scale Reasoning and the LarKC Project

Pipeline

Support

System

Plug-in

RegistryDecider

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Application

LarKC Data Layer

Platform Utility

Functionality

APIs

Plug-ins

Data Layer API

RDF

Store

RDF

Store

RDF

Store

RDF

Doc

RDF

Doc

RDF

Doc

Data Layer

Query

Transformer

Plug-in API

Identifier

Plug-in API

Info. Set

Transformer

Plug-in API

Selecter

Plug-in API

Reasoner

Plug-in API

11

External

systems

External data

sourcesData Layer API

Data Layer

Page 12: Web Scale Reasoning and the LarKC Project

LarKC Data Layer

RDF

Graph

RDF

Graph

RDF

Graph

Default

Graph

RDF

Graph

RDF

Graph

DatasetLabeled Set

Main goal:

• The Data Layer supports LarKC plug-ins:

– storage, retrieval and light-weight inference on top

of large volumes of data

– automates the exchange of RDF data by reference

and by value

12

RDF

GraphRDF

Graph

RDF

Graph

Graph Graph

RDF

GraphRDF

Graph

RDF

Graph

RDF

GraphRDF

Graph

and by value

– offers other utility tools to manage data (e.g.

merger)

Page 13: Web Scale Reasoning and the LarKC Project

Used Concepts in the Data Model

NG1 NG3NG2

Labelled groups

NG4 NG5

Labelled groups

of statements Labelled groups

of statements

Page 14: Web Scale Reasoning and the LarKC Project

Supported Sets of Statements

RDF data types Description Example

Set of statement RDF statements s1, p1, o1, ng1

s2, p2, o2

s3, p3, o3, ng3, {group1}

RDF graph Named graph s1, p1, o1, ng1, {group1}

s2, p2, o2, ng1, {group2}

s3, p3, o3, ng1

Dataset SPARQL dataset

represents a collection

of graphs

s1, p1, o1, ng1

s2, p2, o2, ng2

s3, p3, o3, ng3

Labelled group of

statements

RDF group of

statements

s1, p1, o1, ng1, {group1}

s2, p2, o2, ng2, {group1}

s3, p3, o3, ng3, {group1}

Page 15: Web Scale Reasoning and the LarKC Project

Current Status

15

Page 16: Web Scale Reasoning and the LarKC Project

Released System v1.1: larkc.sourceforge.net

Identify

• SINDICE

• SWOOGLE

• SINDICE

• SWOOGLE

Select

• Spreading Activation

• Geolocation

• Spreading Activation

• Geolocation

• Annotate GATE

• Annotate Cyc

• Annotate GATE

• Annotate Cyc

• Open Apache 2.0 license

• Previous early adopters

workshops @ ESWC ’09,10 and

ISWC ‘09

– participants modified plug-ins,

modified workflows

Transform

• Annotate Cyc

• SPARQL-CycL

• Annotate Cyc

• SPARQL-CycL

Reason

• Jena, IRIS, Pellet

• Cyc, PION

• Siemens

• Jena, IRIS, Pellet

• Cyc, PION

• Siemens

Decide

• Scripted: Real-Time City

• Dynamic Cyc

• Scripted: Real-Time City

• Dynamic CycDecider

Plug-in API

Plug-in Manager

Query

Transformer

Plug-in API

Plug-in Manager

Identifier

Plug-in API

Plug-in Manager

Info. Set

Transformer

Plug-in API

Plug-in Manager

Selecter

Plug-in API

Plug-in Manager

Reasoner

Plug-in API

Plug-in

Registry

Pipeline

Support

System

Standard Open Environment:

Java, subversion, packaged

release, command line build, or eclipse

16

Page 17: Web Scale Reasoning and the LarKC Project

• Distributed Data Layer

• Caching, data warming/cooling

• Data Streaming between

remote components

Next Steps

• Requirements traceability and

update

• Architecture refinement

Platform validationPlatform validation

remote components

• Experimental instrumentation

and monitoring

17

Early Adopters

Page 18: Web Scale Reasoning and the LarKC Project

End

18

Page 19: Web Scale Reasoning and the LarKC Project

Rapid Progress, but We’re Not Finished…

Pipeline

Support

System

Plug-in

RegistryDecider

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Plug-in Manager

Plug-in API

Application

• Classified according to:

• Sources

– Initial Project Objectives (DoW)

– LarKC Collider Platform (WP5 discussions)

– LarKC Rapid Prototyping

– LarKC Use Cases (WP6, WP7a, WP7b)

– LarKC Plug-ins (WP2, WP3, WP4)

Detailed information

in D5.3.1

Requirements

Analysis and

report on lessons

learned during

prototyping

Requirements (WP 5)• Concentrating on parallel and

distributed execution.

• Optimisation of complex

workflows.

• Extend meta-data

representation for QoS,

parallelism and use it.

• Concentrating on parallel and

distributed data layer; caching

Data Layer API

RDF

Store

RDF

Store

RDF

Store

RDF

Doc

RDF

Doc

RDF

Doc

Data Layer

Query

Transformer

Plug-in API

Identifier

Plug-in API

Info. Set

Transformer

Plug-in API

Selecter

Plug-in API

Reasoner

Plug-in API

– Data Caching

– Anytime Behaviour

– Plug-in Registration and

Discovery

– Plug-in Monitoring and

Measurement

– Support for Developers

– Plug-ins

• Classified according to:

– Resources

– Heterogeneity

– Usage

– Interoperability

– Parallelization “within plug-

ins”

– Distributed/remote execution

– Data Layer

prototypingdistributed data layer; caching

and data migration.

• Support more plug-in needs

while maintaining platform

integrity

• Integrate distributed plug-ins

• Support workflows inspired by

human cognition (e.g. workflow

interruption for optimal stopping)

• Support anytime/streaming

• Experimental instrumentation and

monitoring

19

Page 20: Web Scale Reasoning and the LarKC Project

LarKC Plug-in API: General Plug-in Model

+ URI getIdentifier()

+ QoSInformation getQoSInformation()

Plug-in

�Functional

properties

�Non-functional

properties

�WSDL description

Plug-in

description

• Plug-ins are assembled into Workflows, to realise a LarKC Experiment or Application

• Plug-ins are identified by a URI (Uniform Resource Identifier)

• Plug-ins provide MetaData about what they do (Functional properties): e.g. type =

Selecter

• Plug-ins provide information about their behaviour and needs, including Quality of

Service information (Non-functional properties): e.g. Throughput, MinMemory,

Cost,…

• Plug-ins can be provided with a Contract that tells them how to behave (e.g.

Contract : “give me the next 10 results”) and Context information used to store state

between invocations

20

Page 21: Web Scale Reasoning and the LarKC Project

LarKC Plug-in API: IDENTIFY

+ Collection<InformationSet> identify

(Query theQuery, Contract contract, Context

context)

Identifier

• IDENTIFY: Given a query, identify resources that could be used

to answer it

• Sindice – Triple Pattern Query � RDF Graphs

• Google – Keyword Query � Natural Language Document

• Triple Store – SPARQL Query � RDF Graphs

21

Page 22: Web Scale Reasoning and the LarKC Project

LarKC Plug-in API: TRANSFORM (1/2)

Set<Query> transform(Query theQuery, Contract + Set<Query> transform(Query theQuery, Contract

theContract, Context theContext)

QueryTransformer

• Query TRANSFORM: Transforms a query from one

representation to another • SPARQL Query � Triple Pattern Query

• SPARQL Query � Keyword Query

• SPARQL Query � SPARQL Query (different abstraction)

• SPARQL Query � CycL Query

22

Page 23: Web Scale Reasoning and the LarKC Project

LarKC Plug-in API: TRANSFORM (2/2)

+ InformationSet transform(InformationSet

theInformationSet, Contract theContract,

Context theContext)

InformationSetTransformer

• Information Set TRANSFORM: Transforms data from one

representation to another• Natural Language Document � RDF Graph

• Structured Data Sources � RDF Graph

• RDF Graph � RDF Graph (e.g. foaf vocabulary to facebook vocabulary)

23

Page 24: Web Scale Reasoning and the LarKC Project

LarKC Plug-in API: SELECT

+ SetOfStatements select(SetOfStatements

theSetOfStatements, Contract contract,

Context context)

Selecter

• SELECT: Given a set of statements (e.g. a number of RDF

Graphs) will choose a selection/sample from this set– Collection of RDF Graphs � Triple Set (Merged)

– Collection of RDF Graphs � Triple Set (10% of each)

– Collection of RDF Graphs � Triple Set (N Triples)

24

Page 25: Web Scale Reasoning and the LarKC Project

LarKC Plug-in API: REASON

+ VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements

theSetOfStatements, Contract contract, Context context)

+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery,

SetOfStatements theSetOfStatements, Contract contract, Context context)

+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements

theSetOfStatements, Contract contract, Context context)

Reasoner

• REASON: Executes a query against the supplied set of

statements– SPARQL Query � Variable Binding (Select)

– SPARQL Query � Set of statements (Construct)

– SPARQL Query � Set of statements (Describe)

– SPARQL Query � Boolean (Ask)

theSetOfStatements, Contract contract, Context context)

+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery,

SetOfStatements theSetOfStatements, Contract contract, Context context)

25

Page 26: Web Scale Reasoning and the LarKC Project

LarKC Plug-in API: DECIDE

+ VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters

theQoSParameters)

+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters

theQoSParameters)

+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters

theQoSParameters)

Decider

• DECIDE: Builds the workflow and manages the control flow– Scripted Decider: Predefined workflow is built and executed

– Self-configuring Decider: Uses plug-in descriptions (functional and non-functional

properties) to build the workflow

+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters

theQoSParameters)

26