the web of interlinked data and knowledge stripped

Post on 10-May-2015

1.141 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Linked Data for Enterprise Information

Integration

Dr. Sören Auer

Creating Knowledge out of Interlinked Data

Web server

Web server

Problem: Try to search for these things on the current Web:

• Apartments near German-English bilingual childcare in Passau

• ERP service providers with offices in Vienna and London

• Researchers working on multimedia topics in Eastern Europe

Information is available on the Web, but opaque to current search.

Why do we need the Data Web?

passau.de Has everything about childcare in Passau.

Immobilienscout.de Knows all about real estate offers in Germany DB

Web server

DB

Web server

Search engine HTML HTML

RDF RDF

Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate/join such structured information from different sources:

Creating Knowledge out of Interlinked Data

1. Uses RDF Data Model

Linked Data in a Nutshell

KESW2012

St. Petersburg

1.10.2012

IFMO organizes

starts

takesPlaceIn

2. Is serialised in triples: IFMO organizes KESW2012 .

KESW2012 starts “20121001”^^xsd:date .

KESW2012 takesPlaceAt St._Petersburg .

3. Uses Content-negotiation

Subject Predicate Object

The emerging Web of Data

2008 2007

2008 2008

2008

2009 2009

2010

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Creating Knowledge out of Interlinked Data

The situation at a world leading car manufacturer (€97.76 billion

revenue, 250.000 employees):

• 3.000 heterogeneous IT systems

• Different units (car, bus, truck etc.) with very different views

• No common language

• Inability to identify crucial entities (parts, locations etc.)

enterprise wide

There is no (can not be a) single Enterprise Information Model

A distributed, iterative, bottom-up integration approach such as

Linked Data might be able to help (pay-as-you-go).

Can Linked Data help to solve the EII problem in a fortune-500 company?

Creating Knowledge out of Interlinked Data

Distributed Social Semantic Networking

Fro

m In

tran

et

to E

nte

rpri

se D

ata

We

b a

rou

nd

a k

no

wle

dge

hu

b

Creating Knowledge out of Interlinked Data

Inter-linking/ Fusing

Classifi-cation/

Enrichment

Quality Analysis

Evolution / Repair

Search/ Browsing/

Exploration

Extraction

Storage/ Querying

Manual revision/ authoring

Linked Data Lifecycle

Creating Knowledge out of Interlinked Data

Extraction

Inter-linking

Enrichment

Quality Analysis

Evolution Repair

Explora-tion

Extrac-tion

Store Query

Authoring

Creating Knowledge out of Interlinked Data

From unstructured sources

• NLP, text mining, annotation

From semi-structured sources

• DBpedia, LinkedGeoData, DataCube

From structured sources

• RDB2RDF

Extraction

Creating Knowledge out of Interlinked Data

extract structured information from Wikipedia

& make this information available on the Web as LOD:

• ask sophisticated queries against Wikipedia (e.g.

universities in brandenburg, mayors of elevated towns, soccer

players),

• link other data sets on the Web to Wikipedia data

• Represents a community consensus

Recently launched DBpedia Live transforms Wikipedia

into a structured knowledge base

Transforming Wikipedia into an Knowledge Base

S. Auer et al.: DBpedia - A Crystallization Point for the Web of Data. Journal of Web Semantics, Elsevier 2009. Most Cited Article 2006-10 Award S. Auer et al.: DBpedia: A Nucleus for a Web of Open Data. 6th International Semantic Web Conference ISWC07. S. Auer et al.: What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. 4th European Semantic Web Conf. ESWC07

Structure in Wikipedia

• Title • Abstract • Infoboxes • Geo-coordinates • Categories • Images • Links

– other language versions – other Wikipedia pages – To the Web – Redirects – Disambiguations

Infobox templates

{{Infobox Korean settlement

| title = Busan Metropolitan City

| img = Busan.jpg

| imgcaption = A view of the [[Geumjeong]] district in Busan

| hangul = 부산 광역시 ...

| area_km2 = 763.46

| pop = 3635389

| popyear = 2006

| mayor = Hur Nam-sik

| divs = 15 wards (Gu), 1 county (Gun)

| region = [[Yeongnam]]

| dialect = [[Gyeongsang]]

}}

http://dbpedia.org/resource/Busan

dbp:Busan dbpp:title ″Busan Metropolitan City″

dbp:Busan dbpp:hangul ″부산 광역시″@Hang dbp:Busan dbpp:area_km2 ″763.46“^xsd:float

dbp:Busan dbpp:pop ″3635389“^xsd:int

dbp:Busan dbpp:region dbp:Yeongnam

dbp:Busan dbpp:dialect dbp:Gyeongsang

...

Wikitext-Syntax

RDF representation

A vast multi-lingual, multi-domain knowledge base

DBpedia extraction results in: • descriptions of ca. 3.4 million things (1.5 million classified in a consistent

ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases

• labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages; 4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories

• altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions

• DBpedia Live (http://live.dbpedia.org/sparql/) & Mappings Wiki (http://mappings.dbpedia.org) integrate the community into a refinement cycle

• Upcomming DBpedia inline

Creating Knowledge out of Interlinked Data

SELECT ?name ?birth ?description ?person WHERE {

?person dbp:birthPlace dbp:Berlin .

?person skos:subject dbp:Cat:German_musicians .

?person dbp:birth ?birth .

?person foaf:name ?name .

?person rdfs:comment ?description .

FILTER (LANG(?description) = 'en') .

} ORDER BY ?name

DBpedia SPARQL Endpoint

Creating Knowledge out of Interlinked Data

DBpedia Applications: Relfinder

2011/05/12 CONSEGI - Sören Auer: DBpedia 17

Creating Knowledge out of Interlinked Data

Muddy Boots (BBC): Annotate actors in BBC News with DBpedia identifiers

Open Calais (Reuters): named entities connected via owl:sameAs to DBpedia

Faviki (social bookmarking): uses DBpedia to group tags & multi-language support

Topbraid Composer (ontology editor): links entities to DBpedia

DBpedia Applications (3rd party)

Creating Knowledge out of Interlinked Data

Many different approaches: D2R, Virtuoso RDF Views, Triplify,

No agreement on a formal

semantics of RDF2RDF

mapping

• LOD readiness,

SPARQL-SQL translation

W3C RDB2RDF WG

Extraction Relational Data

Tool Triplify Sparqlify D2RQ Virtuoso

RDF Views

Technology Scripting

languages (PHP)

Java Java Whole

middleware solution

SPARQL endpoint

- X X X

Mapping language

SQL SPARQL

CONSTRUCT Views + SQL

RDF based RDF based

Mapping generation

Manual Semi-

automatic Semi-

automatic Manual

Scalability

Medium-high

(but no SPARQL)

Very high Medium High

Malhotra, Auer, Erling, Hausenblas: W3C RDB2RDF Incubator Group Report. W3C RDB2RDF Incubator Group, 2009.

Creating Knowledge out of Interlinked Data

Triplify Light-weight approach for Linked Data publishing from relational databases

Auer, Tramp, Aumüller, Lehmann, Hellmann: Triplify - Light-weight Linked Data Publication from Relational Databases. In 18th International World Wide Web Conference (WWW 2009).

Creating Knowledge out of Interlinked Data

• Rationale: Exploit existing formalisms

(SQL, SPARQL Construct) as much as

possible

• flexible & versatile mapping language

• translating one SPARQL query into

exactly one efficiently executable SQL

query

• Solid theoretical formalization based on

SPARQL-relational algebra

transformations

• Extremely scalable through elaborated

view candidate selection mechanism

• Used to publish 20B triples for

LinkedGeoData

Sparqlify

Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases. Submitted to VLDB-Journal.

SPARQL Construct

SQL View

Bridge

Creating Knowledge out of Interlinked Data

Storage and Querying

Inter-linking

Enrichment

Quality Analysis

Evolution Repair

Explora-tion

Extrac-tion

Store Query

Authoring

Creating Knowledge out of Interlinked Data

Querying still by a factor 3-20 slower than relational data

management (BSBM, DBpedia Benchmark), but more flexibility

Performance increases steadily

Comprehensive, well-supported open-source and commercial

implementations are available:

• OpenLink’s Virtuoso (os+commercial)

• Big OWLIM (commercial), Swift OWLIM (os)

• 4store (os)

• Dydra (hosted)

• Bigdata (distributed)

• Allegrograph (commercial)

• Mulgara (os)

RDF Data Management

Creating Knowledge out of Interlinked Data

• Uses DBpedia as data and a

selection of 25 frequently

executed queries

• Can generate fractions and

multiples of DBpedia‘s size

• Does not resemble relational

data

Performance differences,

observed with other

benchmarks are amplified

DBpedia Benchmark

Geometric Mean

Morsey, Lehmann, Auer, Ngonga: DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data. Int. Semantic Web Conf. (ISWC2011). Best-paper award.

Creating Knowledge out of Interlinked Data

Authoring Inter-

linking Enrichm

ent

Quality Analysis

Evolution Repair

Explora-tion

Extrac-tion

Store Query

Authoring

Creating Knowledge out of Interlinked Data

1. Semantic (Text) Wikis

• Authoring of semantically

annotated texts

2. Semantic Data Wikis

• Direct authoring of

structured information

(i.e. RDF, RDF-Schema,

OWL)

Two Kinds of Semantic Wikis

Creating Knowledge out of Interlinked Data

• Versatile domain-independent tool

• Serves as Linked Data / SPARQL endpoint on the Data Web

• Open-source project hosted at Google code

• Not just a Wiki UI, but a whole framework for the development of

Semantic Web applications

• Developed in PHP based on the Zend framework

• Very active developer and user community

• More than 500 downloads monthly

• Large number of use cases, including industry:

OntoWiki a semantic data wiki

[1] Auer, Dietzold, Riechert: OntoWiki - A Tool for Social, Semantic Collaboration. 5th International Semantic Web Conference, ISWC 2006. [2] Riechert, Morgenstern, Auer, Tramp, Martin: Knowledge Engineering for Historians on the Example of the Catalogus Professorum

Lipsiensis 9th Int. Semantic Web Conference ISWC2010. Best paper award.

Creating Knowledge out of Interlinked Data

The situation at a world leading car manufacturer (€97.76 billion

revenue, 250.000 employees):

• 3.000 heterogeneous IT systems

• Different units (car, bus, truck etc.) with very different views

• No common language

• Inability to identify crucial entities (parts, locations etc.)

enterprise wide

There is no (can not be a) single Enterprise Information Model

A distributed, iterative, bottom-up integration approach such as

Linked Data might be able to help (pay-as-you-go).

Can Linked Data help to solve the EII problem in a fortune-500 company?

Creating Knowledge out of Interlinked Data

OntoWiki with a car model database loaded

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Management of Enterprise Taxonomies with OntoWiki Based on the W3C SKOS standard

Corporate Language Management: 500k concepts in 20 languages

Creating Knowledge out of Interlinked Data

Search for „combi“ also finds T-model

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Structured knowledge base allows to search for specific data (i.e. cars with more than 6 seats)

Creating Knowledge out of Interlinked Data

… or less than 5 liter fuel consumption per 100km

Linked Data & Collaboration for the

Digital Humanities

Riechert, Morgenstern, Auer, Tramp, Martin: Knowledge Engineering for Historians on the Example of the Catalogus Professorum Lipsiensis. 9th International Semantic Web Conference (ISWC2010). Best Paper award.

On

toW

iki

Dynamic views on knowledge bases

OntoWiki for the Catalogus Professorum Lipsiensis

RDF triples on resource details page

Dynamische Vorschläge aus dem Daten Web

OntoWiki for the Catalogus Professorum Lipsiensis

CPM Ontologie

Catalogus Professorum Lipsiensis

Creating Knowledge out of Interlinked Data

© CC-BY-NC-ND by ~Dezz~ (residae on flickr)

Linking

Inter-linking

Enrichment

Quality Analysis

Evolution Repair

Explora-tion

Extrac-tion

Store Query

Authoring

Creating Knowledge out of Interlinked Data

In an uncontrolled

environment as the Data

Web, there will be a

proliferation of equivalent

or similar entity identifiers

Manual Link discovery:

• Sindice integration into UIs

• Semantic Pingback

Semi-automatic:

• SILK

• LIMES

Automatic/ Supervised:

• Raven [1]

Linking Entities on the Data Web

[1] Ngonga, Lehmann, Auer, Höffner: RAVEN -- Active Learning of Link Specifications, OM@ISWC, 2011.

Creating Knowledge out of Interlinked Data

Similarity/Equality/relatedness of entities can be

often expressed using a distance metric (e.g.

strings - edit distance, POIs - euclidian distance)

Uses the characteristics of metric spaces

Esp. consequences of triangle inequality

d(x, y) < d(x, z) + d(z, y)

d(x, z) - d(z, y) < d(x, y) < d(x, z) + d(z, y)

Use pessimistic approximations of distances

instead of computing them

Only compute distances when needed

High-performance LIMES framework is available as open-

source and outperformes state-of-the-art by an order of

magnitude

LIMES: Link Discovery in Metric Spaces

Ngonga, Auer: LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data 22nd Int. Joint Conf. on Artificial Intelligence (IJCAI2011).

Creating Knowledge out of Interlinked Data

Active learning of link specifications:

Raven - Towards Zero-Conguration Link Discovery

Ngonga Ngomo, Lehmann, Auer, Höffner: RAVEN: Towards Zero-Configuration Link Discovery. In OM 2012.

Creating Knowledge out of Interlinked Data

• Experiments even

with very large KBs

(Diseasome &

DBpedia) show that

with 10-20

examples a f-score

of >95% can be

achieved

• Learning iteration

takes <1s

Active learning of link specifications

Creating Knowledge out of Interlinked Data

Enrichment Inter-

linking Enrichm

ent

Quality Analysis

Evolution Repair

Explora-tion

Extrac-tion

Store Query

Authoring

Creating Knowledge out of Interlinked Data

Linked Data is mainly instance data!!!

ORE (Ontology Repair and Enrichment) tool allows to improve an

OWL ontology by fixing inconsistencies & making suggestions for

adding further axioms.

• Ontology Debugging: OWL reasoning to detect inconsistencies and

satisfiable classes + detect the most likely sources for the problems.

user can create a repair plan, while maintaining full control.

• Ontology Enrichment: uses the DL-Learner framework to suggest

definitions & super classes for existing classes in the KB. works if

instance data is available for harmonising schema and data.

http://aksw.org/Projects/ORE

Enrichment & Repair

Lehmann, Auer, Tramp: Class Expression Learning for Ontology Engineering. Journal of Web Semantics (JWS), 2011.

Creating Knowledge out of Interlinked Data

Given:

• Background knowledge base

• Positive and negative examples

(example = individual in ontology)

Goal:

• Find an OWL Class Expression / DL

concept which

• covers as many positive examples as

possible

• covers as few negative examples as

possible

Concept C covers example a <=>

a is instance of C

Analogous problem can be defined for logic

programs => Inductive Logic Programming

Supervised Machine Learning Task

Improving Linked Data Quality by Ontology Learning

Hellmann, Lehmann, Auer: Learning of OWL Class Descriptions on Very Large Knowledge Bases. Int. Journal on Semantic Web & Information Systems (IJSWIS), Vol. 5, Issue 2, April-July 2009, ISSN: 1552-6283.

Creating Knowledge out of Interlinked Data

Analysis Quality

Inter-linking

Enrichment

Quality Analysis

Evolution Repair

Explora-tion

Extrac-tion

Store Query

Authoring

Creating Knowledge out of Interlinked Data

Quality on the Data Web is varying a lot

• Hand crafted or expensively curated knowledge base

(e.g. DBLP, UMLS) vs. extracted from text or Web

2.0 sources (DBpedia)

Research Challenge

• Establish measures for assessing the authority,

provenance, reliability of Data Web resources

Opportunity for EII: Employ crowd-sourced

knowledge from the Data Web in the Enterprise

Linked Data Quality Analysis

FP7-IP DIACHRON Managing the Evolution and Preservation of the Data Web Started April 2013

Creating Knowledge out of Interlinked Data

Evolution © CC-BY-SA by alasis on flickr)

Inter-linking

Enrichment

Quality Analysis

Evolution Repair

Explora-tion

Extrac-tion

Store Query

Authoring

Creating Knowledge out of Interlinked Data

• unified method, for data evolution &

ontology refactoring.

• modularized, declarative definition

of evolution patterns => simple

compared to imperative description

• RDF representation of evolution

patterns => patterns can be shared

and reused on the Data Web.

• declarative definition of bad smells

and corresponding evolution

patterns promotes the (semi-

)automatic improvement of

information quality.

EvoPat Pattern based KB Evolution

Rieß, Heino, Dietzold, Auer: EvoPat - Pattern-Based Evolution and Refactoring of RDF Knowledge Bases. In: 9th International Semantic Web Conference ISWC2010.

Creating Knowledge out of Interlinked Data

Exploration

Inter-linking

Enrichment

Quality Analysis

Evolution Repair

Explora-tion

Extrac-tion

Store Query

Authoring

Creating Knowledge out of Interlinked Data

An ecosystem of LOD visualizations

LOD

Exp

lora

tio

n

Wid

gets

Spatial faceted- browsing

Faceted- browsing

Statistical visualization

Entity-/faceted- Based browsing

Domain specific visualizations … …

LOD

Dat

aset

s C

ho

reo

grap

hy

laye

r

• Dataset analysis (size, vocabularies, property histograms etc.) • Selection of suitable visualization widgets

Brunetti, Auer, García: The Linked Data Visualization Model. To appear in IJSWIS, 2012.

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Creating Knowledge out of Interlinked Data

LOD Life-(Washing-)cycle supported by Debian

based LOD2 Stack

http://stack.lod2.eu

Creating Knowledge out of Interlinked Data

Linked Enterprise Intra Data Webs fill the gap between Intra-/Extranets and EIS/ERP

Unstructured Information Management

Structured Information Management

Support the long tail of enterprise information domains

• Human-resources • Requirements engineering • Supply-chains

Creating Knowledge out of Interlinked Data

When just data shall be exchanged and

integrated SOA is quite expensive

Facilitates data integration along value-chains within and across enterprises

PricewaterhouseCoopers, Technology Forecast, 2009

Creating Knowledge out of Interlinked Data

• Linked Data is a promising technology for closing the

gap between SOA and unstructured information

management

• wealth of knowledge available as LOD can be

leveraged as background knowledge for Enterprise

applications

• The application of Linked Data in the enterprise is still

largely unexplored (opportunity)

• Linked Data will make Enterprise Information Integration

more flexible, iterative, cost effective

Take home messages

Auer, Frischmuth, Klímek, Tramp, Unbehauen, Holzweißig, Marquardt: Linked Data in Enterprise Information Integration Submitted to Semantic Web Journal.

Creating Knowledge out of Interlinked Data

DBpedia “Semantification” of Wikipedia

AKSW: Bridging Theory with Applications

Triplify “Semantification” of (small) Web Applications

OntoWiki Collaborative creation of explicit knowledge via Semantic Wikis

LIMES Link Discovery Framework for metric spaces

Vakantieland Building Data Web applications

SoftWiki Distributed, stakeholder driven Requirements Engineering

Foundations Marrying databases with RDF and ontologies Tools & Datasets

Applications Bringing the Data Web to end users

NLP2RDF Integrating Natural Language processing tool chains with LOD

Enterprise Knowledge Bases Realizing knowledge hubs within an Enterpise’s Data Intranet

Thesaurus Management Defining corp. language & data

DL-Learner Machine Learning for Ontologies

Catalogus Professorum Prosopographical knowledge base

LinkedGeoData “Semantification” of OpenStreetMaps

LESS Semantification Syndication

RDB2RDF Mapping relational data to RDF

ORE Ontology Enrichment & Repair

EU-FP7 LOD2 Project Overview . Page 71 http://lod2.eu

Creating Knowledge out of Interlinked Data

AKSW Team

EU-FP7 LOD2 Project Overview . Page 72 http://lod2.eu

Creating Knowledge out of Interlinked Data

The LOD2 Gang

Creating Knowledge out of Interlinked Data

Thanks for your attention!

Sören Auer

http://www.informatik.uni-leipzig.de/~auer | http://aksw.org | http://lod2.org

auer@informatik.uni-leipzig.de

Soon at:

top related