the web of linked data information universe seongmin lim [email protected] dept. of industrial...

35
The Web of Linked Data Information Universe Seongmin Lim [email protected] Dept. of Industrial Engineering Seoul National University

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

The Web of Linked DataInformation Universe

Seongmin Lim

[email protected]

Dept. of Industrial Engineering

Seoul National University

2

contents

Foundations of Dataspaces and Linked Data- Where do they overlap?

The Web of Linked Data- What data is out there?

Linked Data Applications- What is being done with the data?

Remarks on- Identity- Self-descriptive Data- Pay-as-you-go Integration

3

From data integration systems to dataspace

In order to cope with growing number of data sources

Properties of dataspaces- may contain any kind of data

(structured, semi-structured, unstructured)- require no upfront investment into a global schema- provide for data-coexistence- give best-effort answers to queries- rely on pay-as-you-go data integration

4

Linked data principles

For publishing structured data on the general Web

Tim Berners-Lee

1. Use URIs as names for things.

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful RDF information.

4. Include RDF statements that link to other URIs so that they can discover related things.

5

From classic web to web 2.0

Single global information space No single global dataspace

1. Small set of simple standards 1. APIs have proprietary interfaces

2. Hyperlinks to connect everything 2. Mashups from a fixed data sources

3. No hyperlinks within different APIs

Web APIs slice the Web into Walled Gardens

7

Can’t we just publish data as files?

pdf- Easy to read and publish

Excel- Allows further processing and analysis

csv- Processing without need for proprietary tools

But…- Structure of data not explained- No connection between different data sets, silos- Static and fixed – can’t retrieve just slices relevant to problem

8

Linked data

Extend the Web with a single global dataspace- By using RDF to publish structured data on the Web- By setting links between data items within different data sources

9

What is RDF?

Resource Description Framework

RDF is the data format for linked data

It’s about writing down relations between things

What is RDF for?- For everyone to do same for data- To make the Web into a database

10

The essence of RDF: the ‘triple’

Typical database table

things

propertiess

11

Relations between ‘things’

12

Using the Web’s infrastructure

Entities are identified with HTTP URIs- Specifically http://

13

14

contents

Foundations of Dataspaces and Linked Data- Where do they overlap?

The Web of Linked Data- What data is out there?

Linked Data Applications- What is being done with the data?

Remarks on- Identity- Self-descriptive Data- Pay-as-you-go Integration

15

Properties of the Web of linked data

Global, distributed dataspace built on a simple set of standards- RDF, URIs, HTTP

Entities are connected by links- enables the discovery of new data sources.

Provides for data-coexistence- Everyone can publish data to the Web of Linked Data- Everyone can express their personal view on things- Everybody can use the schemata that they like for this

16

W3C linking open data project

Publish existing open license datasets as linked data Interlink things between different data sources 2007

17

LOD datasets on the Web: July 2009

18

DBpedia

community effort to extract structured

information from Wikipedia. provides data about 3.4 million things- 312,000 persons- 140,000 organizations- 413,000 places- 94,000 music albums- 49,000 films- 146,000 species- …

provides identifiers for many common things- http://dbpedia.org/resource/Calgary

overlaps with many other data sources on the Web

19

Uptakes in many areas

Uptake in life sciences- W3C linking open drug data effort- Bio2RDF project- Allen Brain Atlas

Governments, libraries, media industry, ……

20

The structural continuum

The Web of linked data is interwoven with the classic Web.- Unstructured data: HTML- Semi-structured data: RDFa embed into HTML- Structured data: RDF/XML

Services using named entity recognition to annotate texts with Linked Data URIs- Open Calais (Thomsons Reuters) for news- Zemanta (startup) for blog posts

21

contents

Foundations of Dataspaces and Linked Data- Where do they overlap?

The Web of Linked Data- What data is out there?

Linked Data Applications- What is being done with the data?

Remarks on- Identity- Self-descriptive Data- Pay-as-you-go Integration

22

Linked data browsers

Provide for navigating between data sources in order to explore the dataspace.- Tabulator Browser (MIT, USA)- Marbles (FU Berlin, DE)- OpenLink RDF Browser (OpenLink, UK)- Zitgist RDF Browser (Zitgist, USA)- Disco Hyperdata Browser (FU Berlin, DE)- Fenfire (DERI, Irland)

23

24

Mashups(DBpedia mobile)

25

Web of data search engines

Crawl the dataspace and provide best-effort query answers over crawled data.- Falcons (IWS, China)- Sig.ma (DERI, Ireland)- Swoogle (UMBC, USA)- VisiNav (DERI, Ireland)- Watson (Open University, UK)

26

27

What are the big players doing?

Yahoo! and Google have started to crawl Linked Data in its RDFa serialization as well as Microformats.

Yahoo!- provides access to crawled data through the Yahoo BOSS API- is using the data within Yahoo Search Monkey to make search

results more useful and visually appealing.

Google- uses crawled RDF data for its Social Graph API- uses crawled data to enhance search results snippets for reviews

and people.

28

Yahoo! Search monkey

29

contents

Foundations of Dataspaces and Linked Data- Where do they overlap?

The Web of Linked Data- What data is out there?

Linked Data Applications- What is being done with the data?

Remarks on- Identity- Self-descriptive Data- Pay-as-you-go Integration

30

Identity

Real world objects are identified with multiple URIs- Coupling of identification and retrieval- Data-coexistence: everybody can say everything about anything

31

Enable Clients to retrieve the Schema

Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions.

32

Reuse Terms from Common Vocabularies

Common Vocabularies- Friend-of-a-Friend for describing people and their social network- SIOC for describing forums and blogs- SKOS for representing topic taxonomies- Organization Ontology for describing the structure of

organizations- GoodRelations for describing products and business entities- Music Ontology for describing artists, albums, and performances- Review Vocabulary provides terms for representing reviews

Common sources of identifiers (URIs) for real world objects- LinkedGeoData and Geonames: Locations- GeneID and UniProt: Life science identifiers- Dbpedia: Wide range of things

33

Somebody Pays-As-You-Go

The overall data integration effort is split between the data publisher, the data consumer and third parties.

Data Publisher- publishes data as RDF- publishes data in a self-descriptive fashion- sets links and publishes mappings

Third Parties- set links pointing at your data- publish mappings to the Web

Data Consumer- has to do the rest

34

Summary

Linked Data moves the dataspace vision to a global scale and adds the social/community aspect to it.

The Web of Linked Data is growing rapidly- active deployment communities in different domains- might have exceeded the critical mass

Great playground for experimentation- dataspace profiling- probabilistic and approximate schema mapping- data fusion, data quality, and trust- What will the user interfaces look like?- Will search engines turn into answer engines?

End of Document

Seongmin Lim

[email protected]

Dept. of Industrial Engineering

Seoul National University