dbpedia - uni-mannheim.dewifo5-03.informatik.uni-mannheim.de/bizer/pub/tu... · outline 1. the...

53
DBpedia DBpedia and the and the Web of Data Web of Data Prof. Dr. Chris Bizer F i Ui ität B li Freie Universität Berlin Berlin. November 28, 2008

Upload: doanmien

Post on 02-May-2018

229 views

Category:

Documents


1 download

TRANSCRIPT

DBpediaDBpediaand theand the

Web of DataWeb of DataProf. Dr. Chris Bizer

F i U i ität B liFreie Universität Berlin

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)Berlin. November 28, 2008

Hello

Name Chris Bizer

Job Junior- Professor at Freie Universität Berlin

Projects Projects RAP - RDF API for PHP (together with Universität Leipzig) D2RQ und D2R Server (together with HP Labs Bristol) D2RQ und D2R Server (together with HP Labs Bristol) Named Graphs and NG4J (together with HP Labs Bristol) Fresnel Display Vocabulary (together with MIT and INRIA) Fresnel Display Vocabulary (together with MIT and INRIA) DBpedia (together with Universität Leipzig and OpenLink) Linking Open Data (community project sponsored by W3C) Linking Open Data (community project sponsored by W3C)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Outline

1. The DBpedia Project

2. The Web of Data

3. Linked Data Deployment on the Web

4. Linked Data Applications

5. What is next?

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

DBpedia

DBpedia is a community effort to extract structured information from Wikipedia make this information available on the Web under an open license interlink the DBpedia dataset with other open datasets on the Web

Contributors Freie Universität Berlin (Germany) Universität Leipzig (Germany) OpenLink Software (UK)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Structured Information in Wikipedia

Wikipedia consists of 11.7 million articles (2.62 million in English) in 264 languages monthly growth-rate: 4%

Wikipedia articles contain structured information infoboxes which use a template mechanism categorization of the article images depicting the article’s topic links to external webpages intra-wiki links to other articles inter-language links to articles about the same topic

in different languages

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Extracting Structured Information from Wikipedia

http://en.wikipedia.org/wiki/Calgary

<http://dbpedia.org/resource/Calgary>

dbpedia:native_name “Calgary” ;

dbpedia:elevation “1048” ;

dbpedia:population_city “988193” ;

db di l ti t “1079310”dbpedia:population_metro “1079310” ;

mayor_name

dbpedia:Dave Bronconnier ;dbpedia:Dave_Bronconnier ;

governing_body

dbpedia:Calgary_City_Council ;_ _

...

using a PHP extraction frameworkusing a PHP extraction framework

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

The DBpedia Dataset

Data about 2.6 million “things” including at least including at least 213,000 persons 328,000 places p 57,000 music albums 36,000 films 20,000 companies 80,000 species

Altogether 247 million pieces of information (RDF triples) 29 million triples originate from infoboxes p g 588,000 links to pictures 3,150,000 links to relevant external web pages 4,878,000 RDF links into other databases 414,000 Wikipedia categories 75 000 YAGO categories

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

75,000 YAGO categories

Multi-Lingual Abstracts

The dataset contains a short and a long abstract for each concept.concept.

Short abstracts English: 2 600 000 English: 2,600,000 German: 391,000 French: 383 000 French: 383,000 Dutch: 284,000 Polish: 256 000 Polish: 256,000 Italian: 286,000 Spanish: 226 000 Spanish: 226,000 Japanese: 199,000 Portuguese: 246 000 Portuguese: 246,000 Swedish: 144,000 Chinese 101 000

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Chinese: 101,000

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Accessing the DBpedia Dataset over the Web

1. SPARQL Endpoint

2 Linked Data Interface2. Linked Data Interface

3. Data Dumps for Download

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

The DBpedia SPARQL Endpoint

http://dbpedia.org/sparql

hosted on a OpenLink Virtuoso server

can answer SPARQL queries like can answer SPARQL queries like Give me all Sitcoms that are set in NYC? All German musicians that were born in Berlin in the 19th century? All German musicians that were born in Berlin in the 19th century? All tennis players from Moscow? All films by Quentin Tarentino? All films by Quentin Tarentino? All soccer players with tricot number 11, playing for a club having a

stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Improving Wikipedia Search

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

2. The Web of Data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

The Web of Documents

The Web is a single information space b ild t d d d h li kbuild on open standards and hyperlinks.

Web Browsers

Search Engines

HTTP

HTML HTML HTMLhyper h h

HTMLhyperlinks

hyperlinks

hyperlinks

A B C D

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

A B C D

Linked Data

Use RDF and HTTP to1. publish structured data on the Web,2. set links between data from one data source2. set links between data from one data source

to data within other data sources.

Thing Thing Thing Thing ThingThing

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

datalink

datalink

datalink

datalink

B CA D E

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Linked Data Principles

1. Use URIs as names for things.

2. Use HTTP URIs so that people can look up those names.

3 When someone looks up a URI provide useful RDF3. When someone looks up a URI, provide useful RDF information.

4 I l d RDF t t t th t li k t th URI th t4. Include RDF statements that link to other URIs so that they can discover related things.

Tim Berners-Lee 2007

htt // 3 /D i I /Li k dD t ht lhttp://www.w3.org/DesignIssues/LinkedData.html

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

The RDF Data Model

rdf:type

f f

foaf:Personrdf:type

pd:cygri

Richard Cyganiakfoaf:name

foaf:based neardbpedia:Berlin

foaf:based_near

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Data objects are identified with HTTP URIs

rdf:typepd:cygri

f f

foaf:Personrdf:type

Richard Cyganiakfoaf:name

foaf:based neardbpedia:Berlin

foaf:based_near

pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygridbpedia:Berlin = http://dbpedia.org/resource/Berlindbpedia:Berlin http://dbpedia.org/resource/Berlin

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Dereferencing URIs over the Web

rdf:type

3 405 259f f

foaf:Personrdf:type

pd:cygri

3.405.259dp:populationRichard Cyganiak

foaf:name

foaf:based near

skos:subject

dbpedia:Berlinfoaf:based_near

d Citi i G

skos:subject

dp:Cities_in_Germany

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Dereferencing URIs over the Web

rdf:type

3 405 259f f

foaf:Personrdf:type

pd:cygri

3.405.259dp:populationRichard Cyganiak

foaf:name

foaf:based near

skos:subject

dbpedia:Berlinfoaf:based_near

d Citi i G

skos:subject

db di H bskos:subject

dp:Cities_in_Germanydbpedia:Hamburg

dbpedia:Muenchen skos:subject

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

The Disco – Hyperdata Browser

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

3. Linked Data Deployment on the Web

Is this real?

Thing Thing Thing Thing Thing

Thing Thing Thing Thing Thing

typedlinks

typedlinks

typedlinks

typedlinks

B CA D E

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

W3C Linking Open Data Project

Community effort toy publish existing open license datasets as Linked Data on the Web interlink things between different data sources

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

LOD Datasets on the Web: May 2007

Over 500 million RDF triples

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Around 120,000 RDF links between data sources

LOD Datasets on the Web: August 2007

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

LOD Datasets on the Web: February 2008

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

LOD Datasets on the Web: September 2008

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Triple Count

More than 2 billion RDF triplesMore than 2 billion RDF triples

More than 5 million links between datasets.(rough estimates)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Organizations publishing Linked Data

Universities and Research Institutes Massachusetts Institute of Technology (USA) University of Southampton (UK) Freie Universität Berlin (DE) DERI (IRE) Companies

BBC (UK) KMi, Open University (UK) University of London (UK)

BBC (UK) OpenLink (UK) Zitgist (USA)

Universität Hannover (DE) University of Pennsylvania (USA)

Zitgist (USA) Talis (UK) Garlik (UK)

Universität Leipzig (DE) Universität Karlsruhe (DE)

( ) Mondeca (FR) Cyc Foundation

Joanneum (AT) University of Toronto (CA)

y(USA)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

The Bio2RDF Project

Goals1. Make bioinformatics data available in RDF format on the Web.1. Make bioinformatics data available in RDF format on the Web.2. Promote the linked data vision within the bioinformatics community. 3. Answer questions which were not possible or practical to ask before.

Participants Université Laval, Canada Queensland University of Technology, Australia

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

The Bio2RDF Cloud

27 data sources

260 million records

2,7 billion RDF triples

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

4. Applications

What can I do with this?

Search Linked DataLinked DataEnginesMashupsBrowsers

Thing

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

typedlinks

typedlinks

typedlinks

typedlinks

Thing Thing Thing Thing Thing

B C

links

A D E

links links links

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

B CA D E

Linked Data Browsers

Tabulator Browser (MIT, USA)

Disco Hyperdata Browser (FU Berlin, DE)Disco Hyperdata Browser (FU Berlin, DE)

OpenLink RDF Browser (OpenLink, UK)

Zitgist RDF Browser (Zitgist, USA)

Humboldt (HP Labs, UK)( , )

Fenfire (DERI, Irland)

Marbles (FU Berlin, DE)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Tabulator

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Linked Data Mashups

D i ifi li ti i Li k d D t f th W bDomain-specific applications using Linked Data from the Web

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

DBtune Slashfacet

Visualizes music-related Linked DataUses LastFM MySpace and BBC dataUses LastFM, MySpace, and BBC data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

DBpedia Mobile

Geospatial entry point into the Web of Datainto the Web of Data

Starts with DBpedia, R d Fli k d tRevyu and Flickr data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Semantic Web Pipes

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Web of Data Search Engines

Falcons (IWS, China)

Sindice (DERI, Ireland)

MicroSearch (Yahoo, Spain)( , p )

Watson (Open University, UK)

SWSE (DERI, Ireland)

Swoogle (UMBC, USA)g ( )

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Falcons

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

5. What is next for DBpedia?

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Better Interfaces for Common Wikipedia Users

Direction: free-text search + facet-browsing

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Live Update

Current Situation DBpedia update cycle: 3 month Wikimedia Foundation provides us with live update stream

Opportunity Increase the currency of the DBpedia dataset using this update stream

Result DBpedia in synchronization with Wikipedia DBpedia in synchronization with Wikipedia.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Cross-Language Data Fusion

Opportunity there are 264 Wikipedia Editions in different languages. there are cross-language links. the Italian Wikipedia knows more about Italian villages then

the English one. the German Wikipedia contains more person infoboxes than the German Wikipedia contains more person infoboxes than

the English one.

Idea Idea Augment the infobox dataset with facts from other Wikipedia editions.

Result A much richer DBpedia dataset.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Augment DBpedia with Data from External Sources

Opportunity the Linking Open Data cloud provides lots of useful data

which is not contained in Wikipedia yet. For instance: For instance:

- EuroStat provides additional statistical information about countries.- Musicbrainz contains additional information about other bands.- Geonames provides additional information about locations.

Idea Augment DBpedia with additional data from external sources.

ResultResult A much richer DBpedia dataset.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Contribute back to the Wikipedia Community

Opportunity augmentation with data from the LOD cloud makes the DBpedia dataset

richer than Wikipedia itself. infobox data is extracted from Wikipedia editions in various languages infobox data is extracted from Wikipedia editions in various languages.

Idea Extend the Wikipedia authoring environment with

- Suggestions for infobox values- Cross-language consistency checking for infoboxes- Cross-language consistency checking for infoboxes

Initialize Wikipedia Clean-Up Cycles Initialize Wikipedia Clean-Up Cycles Data-driven search interfaces expose the weaknesses of Wikipedia

template system. Preferred items not showing up in end-user interfaces may motivate

Wikipedia editors to use templates more stringently.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

What is next for the Web of Data?

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Linking

1 Increase the amount of links between datasets1. Increase the amount of links between datasets2. Increase the quality of these links

Today: Simple pattern- and graph-matching based techniques used for automated interlinking.

There is lots of existing work in database and knowledge representation communities on identity resolution to berepresentation communities on identity resolution to be used.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Data Fusion

Application Users want an integratedUsers want an integrated view on all data that is available about an object!

IntegratedView

a a ab e about a object

Raises well known but unsolved problems: S h i

owl:sameAs Schema mapping Inconsistency resolution T t / i f ti lit

DataObject 1

DataObject 3

DataObject 5

Trust / information qualityDataObject 2

DataObject 4

DataObject 6

owl:sameAsowl:sameAs

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

B CA

Thanks!

References DBpedia

http://dbpedia org/Abouthttp://dbpedia.org/About

W3C Linking Open Data Project http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

Tutorial: How to Publish Linked Data on the Webhttp://www4 wiwiss fu berlin de/bizer/pub/LinkedDataTutorial/

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/