georgi kobilarov, chris bizer, sören auer, jens lehmann freie universität berlin, universität...
TRANSCRIPT
Georgi Kobilarov, Chris Bizer, Sören Auer, Jens LehmannFreie Universität Berlin, Universität Leipzig
Infobox Extraction
dbpedia:Albert_Einstein p:name„Albert Einstein“
dbpedia:Albert_Einstein p:birth_place dbpedia:Ulm
dbpedia:Albert_Einstein p:birth_date„ 1956-07-09“
Structuring Wikipedia‘s Knowledge
• Structuring actual data, not modeling the world
• Bound to Wikipedia Templates, parsers handle template values based on rules (property splitting, merging, transformation)
Template Mapping
Class TV Episode (Work)
Wikipedia Templates:Television EpisodeUK Office EpisodeSimpsons Episode
DoctorWhoBox
Template Mapping
Infobox CricketerInfobox Historic CricketerInfobox Recent Cricketer
Infobox Old CricketerInfobox Cricketer Biography
=> Class Cricketer (Athlete)
More structured data
• Categories in SKOS• Intra-wiki links• Disambiguation• Redirects
• Links to Images (and Flickr)• Links to external webpages
MultilingualAbstracts
– English: 2,613,000 – German: 391,000 – French: 383,000 – Dutch: 284,000 – Polish: 256,000 – Italian: 286,000 – Spanish: 226,000 – Japanese: 199,000 – Portuguese: 246,000 – Swedish: 144,000 – Chinese: 101,000
Semantic Web
“My document can point at your document on the Web, but my database can't point at something in your database without writing special purpose code. The Semantic Web aims at fixing that.”
Prof. James Hendler
Web of Documents
Web Browsers
Search Engines
A B C D
HTML HTML HTMLhyperlinks
hyperlinks
hyperlinks
HTML
HTTP
Web of Data
B C
Thing
datalink
A D E
datalink
datalink
datalink
Thing
Thing
Thing
Thing
Thing Thing
Thing
Thing
Thing
Search Engines
Linked DataMashups
Linked DataBrowsers
HTTP HTTP
Linked Data
• Use URIs as names for things• Use HTTP URIs so that people can look up those names.• When someone looks up a URI, provide useful information.• Include links to other URIs. so that they can discover more
things.
Wikipedia Article URI:http://en.wikipedia.org/wiki/Madrid
DBpedia Resource URIhttp://dbpedia.org/resource/Madrid
HTTP URIs
Information Resources
http://dbpedia.org/page/Madrid
HTTP GET -> 200 OK
Real-World Resources
http://dbpedia.org/resource/Madrid
HTTP GET -> 303 See other http://dbpedia.org/page/Madrid http://dbpedia.org/data/Madrid
-> 200 OK
Use Cases
1. Data Source for Web-Applications2. Querying Wikipedia like a database3. Tag Web content with concepts instead of
free-text tags4. Vocabulary and semantic backbone for
enterprise linked data integration
DBpedia as data source
• Embed structured information fromWikipedia into your web applications
• Build (mobile) maps applications using DBpedia data about places
• Display multilingual titles &descriptions in 15 languages
Annotating Documents
• Use DBpedia concepts to annotate documents instead of free-text tags
• Named Entity Extraction Systems already use DBpedia URIs(OpenCalais, Muddy Boots)
• Social Bookmarking with DBpedia URIs as tags www.faviki.com
„Apple“
http://dbpedia.org/resource/Apple_Inc.
http://dbpedia.org/resource/Apple_(fruit)
http://dbpedia.org/resource/Apple_Records
Annotating Documents
• BBC editors tag news articles with DBpedia concepts
• DBpedia Lookup Servicehttp://lookup.dbpedia.org
• Connect data sets with DBpedia as shared vocabulary• Enable meaningful navigation paths across BBC websites• Browsing Madonna-related information across BBC News,
BBC Music, BBC Programmes, …
• Make use of the rich background information:
relate the release of a music album to a news article about the artist
Linking Enterprise Data
Cross-Language Data Fusion
• 264 Wikipedia Editions in different languages– Italian Wikipedians know more about Italian
villages– German Wikipedia contains more person
infoboxes
• Augment the infobox dataset with facts from other Wikipedia editions.
Augment DBpedia with External Data
• Linking Open Data cloud provides more data than Wikipedia– EuroStat provides additional statistical information about
countries.– Musicbrainz contains additional information about other
bands.– Geonames provides additional information about
locations.• Idea
– Augment DBpedia with additional data from external sources.
Contribute back to Wikipedia
• Opportunity– Feed data back to Wikipedia
• Extend the Wikipedia authoring environment with– Suggestions for infobox values– Cross-language consistency checking for infoboxes
• Currently going on– New maps in Wikipedia based on Dbpedia Mobil
Code (OpenStreetMap)
Contribute back to Wikipedia
• Initialize Wikipedia Clean-Up Cycles– Data-driven search interfaces expose the
weaknesses of Wikipedia template system.– Preferred items not showing up in end-user
interfaces may motivate Wikipedia editors to use templates more stringently.
Live Update
• Current Situation– DBpedia update cycle: 3 month– Wikipedia provides us with access to the live
update stream• Opportunity
– Increase the currency of the DBpedia dataset using this update stream
• Result– DBpedia in synchronization with Wikipedia.