the web of linked data -...
TRANSCRIPT
WebDB 2010June 6th 2010 Indianapolis USAJune 6th, 2010, Indianapolis, USA
The Web of Linked DataThe Web of Linked DataA global public dataspace on the WebA global public dataspace on the Web
Christian Bizer Freie Universität Berlin
Christian Bizer: The Web of Linked Data (6/6/2010)
Outline
1. Foundations of Dataspaces and Linked Data Where do they overlap?
2. The Web of Linked Data What data is out there?
3. Linked Data Applications Wh t i b i d ith th d t ? What is being done with the data?
4 Remarks on4. Remarks on Identity
Self-descriptive Data Self-descriptive Data
Pay-as-you-go Integration
Christian Bizer: The Web of Linked Data (6/6/2010)
The Dataspace Vision
Alternative to classic data integration systems in
P ti f d t
order to cope with growing number of data sources.
Properties of dataspaces may contain any kind of data
(structured semi-structured unstructured)(structured, semi structured, unstructured)
require no upfront investment into a global schema
provide for data-coexistence provide for data coexistence
give best-effort answers to queries
rely on pay-as-you-go data integration rely on pay as you go data integration
Franklin M Halevy A and Maier D : From Databases to DataspacesFranklin, M., Halevy, A., and Maier, D.: From Databases to Dataspaces A new Abstraction for Information Management, SIGMOD Rec. 2005.
Christian Bizer: The Web of Linked Data (6/6/2010)
Dataspace Architecture
Christian Bizer: The Web of Linked Data (6/6/2010)Source: Franklin et al: From Databases to Dataspaces, SIGMOD Rec. 2005.
Linked Data Principles
Set of best practices for publishingSet of best practices for publishing structured data on the Web in accordance with the general architecture of the Web.
1 Use URIs as names for things1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful RDFinformation.
4. Include RDF statements that link to other URIs so that they can discover related things.
Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006
Christian Bizer: The Web of Linked Data (6/6/2010)
Architecture of the classic Web
Single global information spaceWeb
BrowsersSearch Engines
Single global information space
S ll t f i l t d dSmall set of simple standards1. HTML as document format2 HTTP URL
HTTP
HTML HTMLHTML
2. HTTP URLs as globally unique IDs retrieval mechanism
hyper-links
retrieval mechanism
3. Hyperlinks to connect everything
B CA B CA
Christian Bizer: The Web of Linked Data (6/6/2010)
Web 2.0 APIs and Mashups
No single global dataspaceMashup
No single global dataspace
Sh t iShortcomings
1. APIs have proprietary interfaces
WebAPI
2. Mashups are based on a fixed set of data sources
3 Y t t h li k
WebAPI
WebAPI
WebAPI
3. You can not set hyperlinks between data items within different APIs
A B C DA B C D
Christian Bizer: The Web of Linked Data (6/6/2010)
Web APIs slice the Web into Walled Gardens
Christian Bizer: The Web of Linked Data (6/6/2010)Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY
Linked Data
Extend the Web with a single global dataspace1. by using RDF to publish structured data on the Web2. by setting links between data items within different
data sources
RDF RDF RDF RDF RDFRDF
RDF
RDF
RDF
RDF
RDF RDF
RDF
RDF
RDF
RDFlink
RDFlinks
RDFlinks
RDFlinks
B CA D E
Christian Bizer: The Web of Linked Data (6/6/2010)
The RDF Data Model
rdf:type
f f
foaf:Personrdf:type
pd:cygri
Richard Cyganiakfoaf:name
foaf:based neardbpedia:Berlin
foaf:based_near
Flexible graph-based data model.
Christian Bizer: The Web of Linked Data (6/6/2010)
Entities are identified with HTTP URIs
rdf:typepd:cygri
f f
foaf:Personrdf:type
Richard Cyganiakfoaf:name
foaf:based neardbpedia:Berlin
foaf:based_near
HTTP URIs take the role of global primary keys.
d i htt // i h d i k d /f f df# ipd:cygri = http://richard.cyganiak.de/foaf.rdf#cygridbpedia:Berlin = http://dbpedia.org/resource/Berlin
Christian Bizer: The Web of Linked Data (6/6/2010)
Resolving URIs over the Web
rdf:type
3 405 259f f
foaf:Personrdf:type
pd:cygri
3.405.259dp:populationRichard Cyganiak
foaf:name
foaf:based near
skos:subject
dbpedia:Berlinfoaf:based_near
d Citi i G
skos:subject
dp:Cities_in_Germany
The HTTP protocol brings together identification and retrie al again
Christian Bizer: The Web of Linked Data (6/6/2010)
retrieval again.
Following Links deeper into the Web
rdf:type
3 405 259f f
foaf:Personrdf:type
pd:cygri
3.405.259dp:populationRichard Cyganiak
foaf:name
foaf:based near
skos:subject
dbpedia:Berlinfoaf:based_near
d Citi i G
skos:subject
db di H bskos:subject
dp:Cities_in_Germanydbpedia:Hamburg
dbpedia:Muenchen skos:subject
Christian Bizer: The Web of Linked Data (6/6/2010)
The Disco – Hyperdata Browser
Christian Bizer: The Web of Linked Data (6/6/2010)
Christian Bizer: The Web of Linked Data (6/6/2010)
Properties of the Web of Linked Data
Global, distributed dataspace built on a simple set of standards RDF, URIs, HTTP
Entities are connected by links creating a global data graph that spans data sources and
enables the discovery of new data sources.
Provides for data-coexistence Everyone can publish data to the Web of Linked Data Everyone can publish data to the Web of Linked Data
Everyone can express their personal view on things
Everybody can use the schemata that they like for this Everybody can use the schemata that they like for this
Christian Bizer: The Web of Linked Data (6/6/2010)
2. Linked Data Deployment on the Web
Is this real?
RDF RDF RDF RDF RDFRDF
RDF
RDF
RDF
RDF
RDF RDF
RDF
RDF
RDF
RDFlink
RDFlinks
RDFlinks
RDFlinks
B CA D E
Christian Bizer: The Web of Linked Data (6/6/2010)
W3C Linking Open Data Project
Grassroots community effort toy publish existing open license datasets as Linked Data on the Web interlink things between different data sources
Christian Bizer: The Web of Linked Data (6/6/2010)
LOD Datasets on the Web: May 2007
Over 500 million RDF triples
Christian Bizer: The Web of Linked Data (6/6/2010)
p Around 120,000 RDF links between data sources
LOD Datasets on the Web: September 2008
Christian Bizer: The Web of Linked Data (6/6/2010)
LOD Datasets on the Web: July 2009
Christian Bizer: The Web of Linked Data (6/6/2010)
Over 13.1 billion RDF triples Over 142 million RDF links between data sources
DBpedia – An Interlinking Hub in the Web of Data
Christian Bizer: The Web of Linked Data (6/6/2010)
DBpedia
community effort to extract structured information from Wikipediainformation from Wikipedia.
provides data about 3.4 million things 312 000 persons 312,000 persons 140,000 organizations 413,000 places413,000 places 94,000 music albums 49,000 films 146,000 species …
provides identifiers for many common things http://dbpedia.org/resource/Calgary
overlaps with many other data sources on the Web
Christian Bizer: The Web of Linked Data (6/6/2010)
The LOD effort is losing track with the diagram :-)
Christian Bizer: The Web of Linked Data (6/6/2010)
Christian Bizer: The Web of Linked Data (6/6/2010)
Christian Bizer: The Web of Linked Data (6/6/2010)
Uptake in Life Sciences
W3C Linking Open Drug Data Effort
Bio2RDF Project
Allen Brain Atlas
Christian Bizer: The Web of Linked Data (6/6/2010)
Uptake in the Libraries Community
Institutions publishing Linked Data Library of Congress (subject headings)
German National Library (PND dataset and subject headings)
Swedish National Library (Libris - catalog)
Hungarian National Library (OPAC and Digital Library)
German Central Library of Economics (subject headings)
Workshop: Semantic Web in Bibliotheken (SWIB09) Köln, 24. und 25. November 2009
http://www.swib09.de/
W3C Library Linked Data Incubator Group
Open Archives Object Reuse and Exchange (OAI-ORE) Standard
Christian Bizer: The Web of Linked Data (6/6/2010)
p j g ( )
Uptake in the Media Industry
publish data as RDF/XML and/or
embed data into HTML using RDFa embed data into HTML using RDFa
Christian Bizer: The Web of Linked Data (6/6/2010)
The Structural Continuum
The Web of Linked Data is interwoven with the classic Web.
Unstructured data: HTML
The Web of Linked Data is interwoven with the classic Web.
Unstructured data: HTML
Semi-structured data: RDFa embed into HTML
Structured data: RDF/XML
Services using named entity recognition to annotate texts with Linked Data URIsto annotate texts with Linked Data URIs Open Calais (Thomsons Reuters) for news
Z t ( t t ) f bl t Zemanta (startup) for blog posts
Christian Bizer: The Web of Linked Data (6/6/2010)
3. Linked Data Applications
What can I do with this?
Search Engines
Linked DataMashups
Linked DataBrowsers EnginesMashupsBrowsers
Thing Thing Thing Thing ThingThing
Thing
Thing
Thing
Thing
Thing Thing
Thing
Thing
Thing
typedlinks
typedlinks
typedlinks
typedlinks
B CA D E
Christian Bizer: The Web of Linked Data (6/6/2010)
Linked Data Browsers
P id f i ti b t d tProvide for navigating between data sources in order to explore the dataspace.
Tabulator Browser (MIT, USA)
Marbles (FU Berlin, DE)
OpenLink RDF Browser (OpenLink, UK)p ( p )
Zitgist RDF Browser (Zitgist, USA)
Di H d t B (FU B li DE)Disco Hyperdata Browser (FU Berlin, DE)
Fenfire (DERI, Irland)
Christian Bizer: The Web of Linked Data (6/6/2010)
Christian Bizer: The Web of Linked Data (6/6/2010)
DBpedia Mobile
Displays DBpedia data on a mapon a map
Provides for navigating into other data sources
Christian Bizer: The Web of Linked Data (6/6/2010)
Web of Data Search Engines
C l th d t d id b t ff tCrawl the dataspace and provide best-effort query answers over crawled data.
Falcons (IWS, China)
Sig.ma (DERI, Ireland)
Swoogle (UMBC, USA)
VisiNav (DERI, Ireland)
W t (O U i it UK)Watson (Open University, UK)
Christian Bizer: The Web of Linked Data (6/6/2010)
Christian Bizer: The Web of Linked Data (6/6/2010)
Christian Bizer: The Web of Linked Data (6/6/2010)
Christian Bizer: The Web of Linked Data (6/6/2010)
What are the big players doing?
Yahoo! and Google have started to crawl Linked Data in its RDFa serialization as well as MicroformatsRDFa serialization as well as Microformats.
Yahoo! provides access to crawled data through the Yahoo BOSS API
is using the data within Yahoo Search Monkey to make search results f l d i ll limore useful and visually appealing.
Google uses crawled RDF data for its Social Graph API
uses crawled data to enhance search results snippets f i d lfor reviews and people.
Christian Bizer: The Web of Linked Data (6/6/2010)
Yahoo! Search Monkey
Christian Bizer: The Web of Linked Data (6/6/2010)
Facebook’s Open Graph Protocol
Facebook imports RDFa data from external web sites.
For instance: IMDb, Microsoft, NHL, Posterous
Rotten Tomatoes, TIME, Yelp
Christian Bizer: The Web of Linked Data (6/6/2010)
4. Remarks on
1 Identif1. Identify
2. Self-descriptive Data p
3. Pay-as-you-go Integration
Christian Bizer: The Web of Linked Data (6/6/2010)
Identity
Real world objects are identified with multiple URIs.
Coupling of identification and retrieval.
Data-coexistence: Everybody can say everything Data-coexistence: Everybody can say everything about everything.
Wrapper around the
Linked Data websiteof our research group
Wrapper around the DBLP bibliography
http://www4.wiwiss.fu-berlin.de/is-group/resource/persons/Person4
http://dblp.l3s.de/d2r/resource/authors/Christian_Bizer
Christian Bizer: The Web of Linked Data (6/6/2010)
Identity Resolution
Publication of owl:sameAs links on the Web.
<http://www4.wiwiss.fu-berlin.de/is-group/resource/persons/Person4>
owl:sameAs
Pay as you go Identity Management
<http://dblp.l3s.de/d2r/resource/authors/Christian_Bizer> .
Pay-as-you-go Identity Management Cheap to set up: Just put a wrapper in front of your DB
(for instance using D2R Server)(for instance using D2R Server)
Later: You or somebody else invests effort into identity resolution
related approach: iTrail hints (Vas Salles et al., VLDB 06, 07) related approach: iTrail hints (Vas Salles et al., VLDB 06, 07)
How to create owl:sameAs links? A t ti b d d l ti t hi d i ti Automatic based on declarative matching descriptions
(for instance using the Silk Linking Framework )
Manually (for instance like within Ueberblick org)
Christian Bizer: The Web of Linked Data (6/6/2010)
Manually (for instance like within Ueberblick.org)
Pay-As-You-Go Data Integration
1. Stage: RAW DATA NOW! don’t care too much about the schema don t care too much about the schema
just publish your data as RDF on the Web
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
2 Stage: Increase the usefulness of your data and2. Stage: Increase the usefulness of your data and ease data integration by making it self-descriptive.
Christian Bizer: The Web of Linked Data (6/6/2010)
Enable Clients to retrieve the Schema
Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions.
Some data on the Web<http://richard.cyganiak.de/foaf.rdf#cygri>
foaf:name "Richard Cyganiak" ;
Some data on the Web
rdf:type <http://xmlns.com/foaf/0.1/Person> .
Resolve unknown term
RDFS or OWL definition
Resolve unknown term http://xmlns.com/foaf/0.1/Person
<http://xmlns.com/foaf/0.1/Person>
rdf:type owl:Class ;
RDFS or OWL definition
rdfs:label "Person";
rdfs:subClassOf <http://xmlns.com/foaf/0.1/Agent> ;
rdfs:subClassOf <http://xmlns.com/wordnet/1.6/Agent> .
Christian Bizer: The Web of Linked Data (6/6/2010)
Reuse Terms from Common Vocabularies
Common Vocabularies Friend-of-a-Friend for describing people and their social network
SIOC for describing forums and blogs
SKOS for representing topic taxonomies
Organization Ontology for describing the structure of organizations
GoodRelations for describing products and business entities
Music Ontology for describing artists, albums, and performances
Review Vocabulary provides terms for representing reviews
Common sources of identifiers (URIs) for real world objects LinkedGeoData and Geonames: Locations
GeneID and UniProt: Life science identifiers
Dbpedia: Wide range of things
Christian Bizer: The Web of Linked Data (6/6/2010)
Publish Schema Mappings on the Web
<http://xmlns com/foaf/0 1/Person>
Schema Mapping<http://xmlns.com/foaf/0.1/Person>
owl:equivalentClass
<http://dbpedia.org/ontology/Person> .
Simple Mappings: OWL owl:equivalentClass, owl:equivalentProperty
Complex Mappings: R2Rg provides value transformation functions
structural transformations
Pay-as-you-go Aspecty y g p1. Use a mix of common vocabularies and proprietary terms
2. You or somebody else publishes schema mappings afterwards
Christian Bizer: The Web of Linked Data (6/6/2010)
Somebody-Pays-As-You-Go
The overall data integration effort is Fix
Overall Data Integration
split between the data publisher, the data consumer and third parties.
Data Publisher publishes data as RDF
IntegrationEffort
publishes data as RDF
publishes data in a self-descriptive fashion
sets links and publishes mappings sets links and publishes mappings
Third PartiesThird
Publisher‘s set links pointing at your data
publish mappings to the Web
Party Effort
Publisher‘sEffort
Data Consumer has to do the rest
Consumer‘sEffort
Christian Bizer: The Web of Linked Data (6/6/2010)
Hands on: How to play around with Linked Data
Christian Bizer: The Web of Linked Data (6/6/2010)
Hands on: How to play around with Linked Data
1. Get some data using a crawler for instance: LDspider (GPL license)
http://code.google.com/p/ldspider/
2. Store the data using for instance: Virtuoso (GPL), Sesame (BSD), Jena TDB (BSD)
or any relational database or column store you like
decision help: Berlin SPARQL Benchmark (Nov 2009)
3. Query and analyze the data using the SPARQL query language using the SPARQL query language
SPARQL 1.1 adds support for aggregates, subqueries, negation
Christian Bizer: The Web of Linked Data (6/6/2010)
Shortcut: Billion Triples Challenge Dataset
Download the Billion Triples Challenge Dataset 3.2 billion triples (27GB gzipped)
crawled from the public Web of Linked Data in March/April 2010
http://challenge.semanticweb.org/
If you do something interesting with the data submit your results to the challenge until October 1st
present your results at the 9th International Semantic Web Conference (ISWC2010), November 2010, Shanghai, China
Christian Bizer: The Web of Linked Data (6/6/2010)
Summary
Linked Data moves the dataspace vision to a global scaleand adds the social/community aspect to it.
The Web of Linked Data is growing rapidlyg g p y active deployment communities in different domains
might have exceeded the critical massg
Great playground for experimentation dataspace profiling dataspace profiling
probabilistic and approximate schema mapping
data fusion data quality and trust data fusion, data quality, and trust
What will the user interfaces look like?
Will search engines turn into answer engines? Will search engines turn into answer engines?
Christian Bizer: The Web of Linked Data (6/6/2010)
Thanks!
References Overview Article Overview Article
Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Farhttp://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf
Linking Open Data Project Wiki http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
Tutorial on How to Publish Linked Data on the Web Tutorial on How to Publish Linked Data on the Webhttp://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
3rd Linked Data on the Web Workshop at WWW2010
Christian Bizer: The Web of Linked Data (6/6/2010)
3 Linked Data on the Web Workshop at WWW2010http://events.linkeddata.org/ldow2010/