georgi kobilarov, chris bizer, sören auer, jens...

72
Georgi Kobilarov , Chris Bizer, Sören Auer, Jens Lehmann Freie Universität Berlin, Universität Leipzig Freie Universität Berlin, Universität Leipzig

Upload: phungliem

Post on 10-Jun-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

Georgi Kobilarov, Chris Bizer, Sören Auer, Jens Lehmann

Freie Universität Berlin, Universität LeipzigFreie Universität Berlin, Universität Leipzig

Querying Wikipedia

like a Database

Domain specific

Titl

pData

Title

DescriptionImages

p

LanguagesInfoboxes

Web Links

Categorization

I f b E t tiInfobox Extraction

dbpedia:Albert Einstein p:namedbpedia:Albert_Einstein p:name„Albert Einstein“

dbpedia:Albert Einstein p:birth placedbpedia:Albert_Einstein p:birth_place dbpedia:Ulm

dbpedia:Albert Einstein p:birth datedbpedia:Albert_Einstein p:birth_date„ 1956‐07‐09“

P t SProperty Synonyms

St t i Wiki di ‘ K l dStructuring Wikipedia‘s Knowledge

• Structuring actual data, not modeling theworldworld

• Bound to Wikipedia Templates, parsers handle template values based on rules (propertysplitting merging transformation)splitting, merging, transformation)

DB di O t lDBpedia Ontology

• DBpedia Ontology build from scratch

• 170 classes 900 properties• 170 classes, 900 properties

l hNo living things

Cl Hi hClass Hierarchy

„Select all TV Episodes …“„ p

T l t M iTemplate Mapping

Class TV Episode (Work)

Wikipedia Templates:Wikipedia Templates:

Television Episodep

UK Office Episode

Simpsons Episode

D t Wh BDoctorWhoBox

T l t M iTemplate Mapping

I f b C i k tInfobox CricketerInfobox Historic CricketerInfobox Historic CricketerInfobox Recent CricketerInfobox Old Cricketer

Infobox Cricketer BiographyInfobox Cricketer Biography

=> Class Cricketer (Athlete)

P lPeople

Actors

Athlete

JournalistJournalist

MusicalArtist

Politician

Scientist

W itWriter

PlPlaces

Airport

City

CountryCountry

Island

Mountain

River

O i tiOrganisations

Band

Company

Educational InstitutionEducational Institution

Radio Station

Sports Team

E tEvent

Convention

Military Conflict

Music EventMusic Event

Sport Eventp

W kWork

Book

Broadcast

FilmFilm

Software

Television

M t t d d tMore structured data

• Categories in SKOS

• Intra‐wiki links

• Disambiguation• Disambiguation

• Redirects

• Links to Images (and Flickr)

Li k t t l b• Links to external webpages

• Data about 2.6 million “things”

• 274 million pieces of information (RDF triples)

M ltili lMultilingual

Abstracts– English: 2,613,000 – German: 391,000 – French: 383,000 – Dutch: 284,000 – Polish: 256,000 – Italian: 286,000 – Spanish: 226,000 – Japanese: 199,000 – Portuguese: 246,000 S di h 144 000– Swedish: 144,000 

– Chinese: 101,000

DBpedia asp

Linked Data HubLinked Data Hub

S ti W bSemantic Web

“My document can point at your document on the Web but my database can't point atthe Web, but my database can t point at something in your database without writing 

l d h bspecial purpose code. The Semantic Web aims at fixing that.”g

Prof. James Hendler

W b f D tWeb of Documents

Web Browsers

Search Engines

HTTP

HTML HTML HTMLhyper h h

HTMLhyperlinks

hyperlinks

hyperlinks

A B C DA B C D

W b f D tWeb of Data

Search  Linked DataLinked DataEngines MashupsBrowsers

HTTP HTTP

Thing Thing Thing Thing Thing

data data data data

Thing Thing Thing Thing Thing

datalink

datalink

datalink

datalink

B CA D E

Li k d D tLinked Data

• Use URIs as names for thingsg• Use HTTP URIs so that people can look up those names.• When someone looks up a URI, provide useful information.p , p• Include links to other URIs. so that they can discover more 

things.

Wikipedia Article URI:h // iki di / iki/ d idhttp://en.wikipedia.org/wiki/Madrid

DBpedia Resource URIhttp://dbpedia org/resource/Madridhttp://dbpedia.org/resource/Madrid

HTTP URIHTTP URIs

Information Resources Real‐World Resources

htt //db di / /M d id

http://dbpedia.org/resource/Madrid

http://dbpedia.org/page/Madrid

HTTP GET > 200 OKHTTP GET ‐> 303 See other

HTTP GET ‐> 200 OKhttp://dbpedia.org/page/Madrid http://dbpedia.org/data/Madrid

‐> 200 OK

Online ActivitiesMusic Online Activities

PublicationsGeographic

Cross-Domain

Life SciencesLife Sciences

4.5 billion triples 180 million data links

Use CasesUse Cases

U CUse Cases

1. Data Source for Web‐Applications

2. Querying Wikipedia like a database

3 Tag Web content with concepts instead of3. Tag Web content with concepts instead offree‐text tags

4. Vocabulary and semantic backbone forenterprise linked data integrationenterprise linked data integration

DB di d tDBpedia as data source

• Embed structured information fromWikipedia into your web applicationsWikipedia into your web applications

• Build (mobile) maps applications usingDB di d b lDBpedia data about places

Di l ltili l titl &• Display multilingual titles &descriptions in 15 languages

DB di M bilDBpedia Mobile

S l E d i tSparql Endpoint

http://dbpedia.org/sparql

Wiki di QWikipedia Query

A t ti D tAnnotating Documents

• Use DBpedia concepts to annotate documentsinstead of free‐text tagsinstead of free text tags

• Named Entity Extraction Systems already use DBpedia URIs(OpenCalais Muddy Boots)(OpenCalais, Muddy Boots)

• Social Bookmarking with DBpedia URIs as tags www faviki comwww.faviki.com

A l “„Apple“

http://dbpedia.org/resource/Apple_Inc.

http://dbpedia org/resource/Apple (fruit)http://dbpedia.org/resource/Apple_(fruit)

http://dbpedia.org/resource/Apple_Records

A t ti D tAnnotating Documents

• BBC editors tag news articles with DBpediatconcepts

• DBpedia Lookup ServiceDBpedia Lookup Servicehttp://lookup.dbpedia.org

Li ki E t i D tLinking Enterprise Data

Take the Linking Open Data 

h t th t iapproach to the enterprises

Li ki E t i D tLinking Enterprise Data

• Connect data sets with DBpedia as shared vocabulary

• Enable meaningful navigation paths across BBC websites• Enable meaningful navigation paths across BBC websites

• Browsing Madonna‐related information across BBC News, BBC Music BBC ProgrammesBBC Music, BBC Programmes, …

• Make use of the rich background information:

relate the release of a music album to a news article aboutthe artist

The Future of DBpedia

Improve Information Extraction

Croud‐sourceCroud source

Information ExtractionInformation Extraction

C d S d E t tiCrowd Sourced Extraction

Wh ‘ h b fi ?Where‘s the user benefit?

Data Fusion

C L D t F iCross‐Language Data Fusion

• 264 Wikipedia Editions in different languages– Italian Wikipedians know more about Italian villages

– German Wikipedia contains more person infoboxesinfoboxes

• Augment the infobox dataset with facts from other Wikipedia editionsother Wikipedia editions.

A t DB di ith E t l D tAugment DBpedia with External Data

• Linking Open Data cloud provides more data than WikipediaWikipedia– EuroStat provides additional statistical information about countries.

– Musicbrainz contains additional information about other bands.

– Geonames provides additional information about locations.

• Idea – Augment DBpedia with additional data from external g psources.

C t ib t b k t Wiki diContribute back to Wikipedia

• OpportunityF d d t b k t Wiki di– Feed data back to Wikipedia

• Extend the Wikipedia authoring environment p gwith– Suggestions for infobox values– Suggestions for infobox values– Cross‐language consistency checking for infoboxes

• Currently going on– New maps in Wikipedia based on Dbpedia MobilNew maps in Wikipedia based on Dbpedia Mobil Code (OpenStreetMap)

C t ib t b k t Wiki diContribute back to Wikipedia

• Initialize Wikipedia Clean‐Up Cycles– Data‐driven search interfaces expose the weaknesses of Wikipedia template system.

– Preferred items not showing up in end‐user interfaces may motivate Wikipedia editors to useinterfaces may motivate Wikipedia editors to use templates more stringently.

Li U d tLive Update

• Current SituationDB di d t l 3 th– DBpedia update cycle: 3 month

– Wikipedia provides us with access to the live update stream

• OpportunityOpportunity– Increase the currency of the DBpedia dataset using this update streamusing this update stream

• Result– DBpedia in synchronization with Wikipedia.

Open Sourcep

Open Datap

What is the

Wikipedia for Data?Wikipedia for Data?

Wikipedia is thep

Wikipedia for DataWikipedia for Data

Summary

h //db dihttp://dbpedia.org

georgi.kobilarov@fu‐berlin.de