linked data at present using linked data

59
Linked Data at present Using Linked Data Ferdowsi University of Mashhad Web Technology Lab. (WTLab), www.wtlab.um.ac.ir Linked Data Group (LDG) Mahboubeh Dadkhah May 11, 2011

Upload: wesley

Post on 23-Jan-2016

62 views

Category:

Documents


0 download

DESCRIPTION

Linked Data at present Using Linked Data. Ferdowsi University of Mashhad Web Technology Lab. ( WTLab ), www.wtlab.um.ac.ir Linked Data Group (LDG ). Mahboubeh Dadkhah May 11, 2011. You may know the Linked Data. History. Linked Data Design Issues by TimBL July 2006 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linked Data at present Using Linked Data

Linked Data at presentUsing Linked Data

Ferdowsi University of MashhadWeb Technology Lab. (WTLab), www.wtlab.um.ac.ir

Linked Data Group (LDG)

Mahboubeh DadkhahMay 11, 2011

Page 2: Linked Data at present Using Linked Data

You may know the

Linked Data

Page 3: Linked Data at present Using Linked Data

History• Linked Data Design Issues by TimBL July 2006• Linked Open Data Project WWW2007• First LOD Cloud May 2007• 1st Linked Data on the Web Workshop WWW2008• 1stTriplification Challenge 2008• How to Publish Linked Data Tutorial ISWC2008• BBC publishes Linked Data 2008• 2nd Linked Data on the Web Workshop WWW2009• NY Times announcement SemTech2009 - ISWC09• 1st Linked Data-a-thon ISWC2009• 1st How to Consume Linked Data Tutorial ISWC2009• Data.gov.uk publishes Linked Data 2010• 2st How to Consume Linked Data Tutorial WWW2010• 1st International Workshop on Consuming Linked Data COLD2010• …

Page 4: Linked Data at present Using Linked Data

May 2007

Page 5: Linked Data at present Using Linked Data
Page 6: Linked Data at present Using Linked Data

Cloud statistics

Page 7: Linked Data at present Using Linked Data

Now that the Linked Data is here

What to do next?

Let’s Make Use of It

Page 8: Linked Data at present Using Linked Data

Linked Data• Before using we should be sure that we

understand the meaning.

• What was the problem:Searching and Finding

Search for

Football Players who went to the University of Texas at Austin, played for the Dallas Cowboys as Cornerback

Page 9: Linked Data at present Using Linked Data
Page 10: Linked Data at present Using Linked Data

Current Web = internet + links + docs

Why cant we find it?

Page 11: Linked Data at present Using Linked Data

So, what to do?

• Make it easy for computers/software to find THINGS

Publish Thing

• As data• In a standardized way: RDF• RDF data is serialized in different ways:

– RDF/XML, RDFa, N3, Turtle, JSON

Page 12: Linked Data at present Using Linked Data

http://…/isbn978

http://…/isbn978

Programming the Semantic Web

Programming the Semantic Web

978-0-596-15381-6978-0-596-15381-6

Toby SegaranToby Segaran

http://…/publisher1http://…/

publisher1O’ReillyO’Reilly

title

name

author

publisher

isbn

http://…/isbn978

http://…/isbn978

sameAs

http://…/

review1

http://…/

review1

Awesome Book

Awesome Book

http://…/

reviewer

http://…/

reviewer

Juan Sequeda

Juan Sequeda

http://juansequeda.

com/id

http://juansequeda.

com/id

hasReview

hasReviewer

description

name

sameAs

livesIn

Juan SequedaJuan Sequedaname

http://dbpedia.org/Austinhttp://dbpedia.org/Austin

Page 13: Linked Data at present Using Linked Data

2009’s Top 10 Linked Data Research Issues

• Data Linking and Fusion – linking algorithms and heuristics, identity resolution – Web data integration and data fusion– evaluating quality and trustworthiness of Linked Data

• Linked Data Application Architectures – crawling, caching and querying Linked Data on the Web; optimizations,

performance– Linked Data browsers, search engines– applications that exploit distributed Web datasets

• Data Publishing – tools for publishing large data sources as Linked Data on the Web (e.g. relational

databases, XML repositories)– embedding data into classic Web documents (e.g. GRDDL, RDFa, Microformats)– licensing and provenance tracking issues in Linked Data publishing– business models for Linked Data publishing and consumption

Page 14: Linked Data at present Using Linked Data

2010’s Top 10 Linked Data Research Issues• Linked Data Application Architectures

– crawling, caching and querying Linked Data– dataset dynamics and synchronization– Linked Data mining

• Data Linking and Data Fusion – linking algorithms and heuristics, identity resolution– Web data integration and data fusion– link maintanance– performance of linking infrastructures/algorithms on Web data

• Quality, Trust and Provenance in Linked Data – tracking provenance and usage of Linked Data– evaluating quality and trustworthiness of Linked Data– profiling of Linked Data sources

• User Interfaces for the Web of Data – approaches to visualizing and interacting with distributed Web data– Linked Data browsers and search engines

• Data Publishing – tools for publishing large data sources as Linked Data on the Web (e.g. relational databases, XML

repositories)– embedding data into classic Web documents (e.g. RDFa, Microformats)– describing data on the Web (e.g. voiD, semantic site maps)– licensing issues in Linked Data publishing

Page 15: Linked Data at present Using Linked Data

2011’s Top 10 Linked Data Research Issues• Foundations of Linked Data

– Web architecture and dataspace theory– dataset dynamics and synchronisation– analyzing and profiling the Web of Data

• Data Linking and Fusion – entity consolidation and linking algorithms– Web-based data integration and data fusion– performance and scalability of integration architectures

• Write-enabling the Web of Data – access authentication mechanisms for Linked Datasets (WebID, etc.)– authorisation mechanisms for Linked Datasets (WebACL, etc.)– enabling write-access to legacy data sources (Google APIs, Flickr API, etc.)

• Data Publishing – publishing legacy data sources as Linked Data on the Web– cost-benefits of the 5 star LOD plan

• Data Usage – tracking provenance of Linked Data– evaluating quality and trustworthiness of Linked Data– licensing issues in Linked Data publishing– distributed query of Linked Data– RDF-to-X, turning RDF to legacy data

• Interacting with the Web of Data – approaches to visualising Linked Data– interacting with distributed Web data– Linked Data browsers, indexers and search engines

Page 16: Linked Data at present Using Linked Data

Linked Data makes the web appear as

ONEGIANTHUGE

GLOBALDATABASE!

Page 17: Linked Data at present Using Linked Data

Do you remember

Search and Find

?

Page 18: Linked Data at present Using Linked Data

SPARQL Endpoints

• Linked Data sources usually provide a SPARQL endpoint for their dataset(s)

• SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol*

• Send your SPARQL query, receive the result

* http://www.w3.org/TR/rdf-sparql-protocol/

Query Linked Data with

Page 19: Linked Data at present Using Linked Data

http://www.w3.org/wiki/SparqlEndpoints

Page 20: Linked Data at present Using Linked Data

http://labs.mondeca.com/sparqlEndpointsStatus/

Page 21: Linked Data at present Using Linked Data

SPARQL queries over multiple datasets

How to do this?

1. Issue follow-up queries to different endpoints2. Querying a central collection of datasets3. Build store with copies of relevant datasets4. Use query federation system

Page 22: Linked Data at present Using Linked Data

1 -Follow-up Queries

• Idea: issue follow-up queries over other datasets based on results from previous queries

• Substituting placeholders in query templates

Page 23: Linked Data at present Using Linked Data

String s1 = "http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql";

String qTmpl = "SELECT ?c WHERE{ <%s>rdfs:comment ?c }";String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) {QuerySolution s1 = results.nextSolution();

String q2 = String.format( qTmpl, s1.getResource("s"),getURI() );QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect();

while ( results2.hasNext() ) {// ...

}e2.close();

}e1.close();

Find a list of companies Filtered by some criteria and return DbpediaURIs

from them

Page 24: Linked Data at present Using Linked Data

1 -Follow-up Queries

Advantage– Queried data is up-to-date

×Drawbacks– Requires the existence of a SPARQL endpoint for each

dataset– Requires program logic– Very inefficient

Page 25: Linked Data at present Using Linked Data

2- Querying a Collection of Datasets

• Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets

• Example:– SPARQL endpoint over a majority of datasets from

the LOD cloud at:

http://lod.openlinksw.com/sparql

http://uberblic.org

Page 26: Linked Data at present Using Linked Data

(Linked )Data Marketplaces• FactForge

– Integrates some of the most central LOD datasets– General-purpose information(not specific to a

domain)– 1.2billion explicit and 1 billion inferred statements– The largest upper-level knowledge base– http://www.FactForge.net

• LinkedLifeData– 25 of the most popular life-science datasets– 2.7billion explicit and 1.4 billion inferred statements– http://www.LinkedLifeData.com

Page 27: Linked Data at present Using Linked Data
Page 28: Linked Data at present Using Linked Data

2- Querying a Collection of Datasets

Advantage– No need for specific program logic

×Drawbacks– Queried data might be out of date – Not all relevant datasets in the collection

Page 29: Linked Data at present Using Linked Data

3- Own Store of Dataset Copies

• Idea: Build your own store with copies of relevant datasets and query it

• Possible stores:– Jena TDB http://jena.hpl.hp.com/wiki/TDB– Sesame http://www.openrdf.org/– OpenLink Virtuoso http://virtuoso.openlinksw.com/– 4store http://4store.org/– AllegroGraphhttp://www.franz.com/agraph/– etc.

Page 30: Linked Data at present Using Linked Data

3- Own Store of Dataset Copies

Advantages– No need for specific program logic – Can include all datasets– Independent of the existence, availability, and

efficiency of SPARQL endpoints

×Drawbacks– Requires effort to set up and to operate the store – Ideally, data sources provide RDF dumps; if not? – How to keep the copies in sync with the originals?– Queried data might be out of date

Page 31: Linked Data at present Using Linked Data

4- Federated Query Processing

• Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results

Page 32: Linked Data at present Using Linked Data

4- Federated Query Processing

• DARQ (Distributed ARQ)– http://darq.sourceforge.net/– Query engine for federated SPARQL queries– Extension of ARQ (query engine for Jena)– Last update: June 28, 2006

• Semantic Web Integrator and Query Engine(SemWIQ)– http://semwiq.sourceforge.net/– Actively maintained!

Page 33: Linked Data at present Using Linked Data

4- Federated Query Processing

Advantages– No need for specific program logic – Queried data is up to date

×Drawbacks– Requires the existence of a SPARQL endpoint for each

dataset– Requires effort to set up and configure the mediator

Page 34: Linked Data at present Using Linked Data

In any case• You have to know the relevant data sources

– When developing the app using follow-up queries– When selecting an existing SPARQL endpoint over a collection of

dataset copies– When setting up your own store with a collection of dataset

copies– When configuring your query federation system

• You restrict yourself to the selected sources

Automated Link Traversal

Idea: Discover further data by looking up relevant URIs in your application

Can be combined with the previous approaches

Page 35: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

• Applies the idea of automated link traversal to the execution of SPARQL queries

• Idea:– Intertwine query evaluation with traversal of RDF links– Discover data that might contribute to query results

during query execution

• Alternately:– Evaluate parts of the query – Look up URIs in intermediate solutions

Page 36: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 37: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 38: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 39: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 40: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 41: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 42: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 43: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 44: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 45: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Page 46: Linked Data at present Using Linked Data

Link Traversal Based Query Execution

Advantages– No need to know all data sources in advance– No need for specific programming logic– Queried data is up to date– Does not depend on the existence of SPARQL endpoints provided by

the data sources

×Drawbacks– Not as fast as a centralized collection of copies– Unsuitable for some queries– Results might be incomplete (do we care?)

Page 47: Linked Data at present Using Linked Data

Implementations

• Semantic Web Client library (SWClLib) for Javahttp://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/• SWIC for Prologhttp://moustaki.org/swic/

• SQUIN http://squin.org– Provides SWClLib functionality as a Web service– Accessible like a SPARQL endpoint

Page 48: Linked Data at present Using Linked Data

Real World Example

Page 49: Linked Data at present Using Linked Data

What is a Linked Data application?

Software system that makes use of data on the web from multiple datasets and that benefits

from links between the datasets

Page 50: Linked Data at present Using Linked Data

Characteristics of Linked Data Applications

• Consume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data.

• Discover further information by following the links between different data sources: the fourth principle enables this.

• Combine the consumed linked data with data from sources (not necessarily Linked Data).

• Expose the combined data back to the web following the Linked Data principles.

• Offer value to end-users.

Page 51: Linked Data at present Using Linked Data

1st Linked Data-a-thon• co-located at 8th International Semantic Web Conference (ISWC 2009)• The overall goal of this event was to – Create a Linked Data application that shows innovative and new functionality – Show that a "quick and dirty" Linked Data application can be developed in 3 days

Winners• United States Linked Data Overlay – Use Linked Data about geographical locations and display it on Google Earth.

• www.diversity-search.info – Web and Image search engine augmented with Linked Data – Pictures of David Beckham playing football in the different clubs he has played for

• Find traditional Chinese medicine as an alternative to western drugs • iGoogr: Imagine Google was using Good Relations vocabulary for e‐

commerce

Page 52: Linked Data at present Using Linked Data

Self-Service Development of Linked Data Applications

• Semantic cloud computing• the Information Workbench as a self-service platform for the fast

development of domain-specific Linked Data solutions• Designed with the goal to leverage Linked Data deployment in the

enterprise• implements concepts and features for data integration, interactive

visualization, exploration and analytics, as well as the collaborative acquisition and authoring of Linked Data• Data sources can be dynamically integrated at the click of a button• the user interface can be flexibly customized based on a large, extensible

collection of widgets supporting data visualization, exploration, and collaboration

Page 53: Linked Data at present Using Linked Data

Self-Service Development of Linked Data Applications

• Platform for Linked Data Application Development– Base functionality to build applications without any programming– SDK for easy extensions– Available in Open Source at http://iwb.fluidops.com/

• Covering the entire lifecycle of interacting with Linked Data– Discovery of data sources– Integration of data sources– Visualization– Search and Exploration– Collaborative generation of data

• Targeted at– Linked Open Data, Linked Government Data– Linked Enterprise Data– Combinations thereof

Page 54: Linked Data at present Using Linked Data
Page 55: Linked Data at present Using Linked Data

Still remember

Search and Find

?

Page 56: Linked Data at present Using Linked Data

challenges

• discovering relevant data sources• discovering useful vocabularies• The query• Data Quality

• Finding more/useful Links

Page 57: Linked Data at present Using Linked Data
Page 58: Linked Data at present Using Linked Data

Challenges (COLD 2010)• Web scale data management (indexing, crawling, etc.) • Query processing over multiple linked datasets • Search in the Web of Data • Auto-discovery

– of URIs,– of additional data that is not from the authoritative source of a URI,– of relevant linked datasets in general

• Caching and replication • Dataset dynamics

– processing change notifications,– keeping consistency,– temporal tracking of linked datasets

• Reasoning on Linked Data from multiple sources • Knowledge discovery deriving insights from the Web of Data • Information quality of Linked Data

– information quality assessment,– trustworthiness,– provenance

• User interface research for the interaction with the Web of Data – user interaction and usability,– visualizing Linked Data,– natural language interfaces

Page 59: Linked Data at present Using Linked Data

Any Opinion!

Any Question!

Thank You