webinar: transforming your graph analytics with graphdb

104
27 October 2016

Upload: ontotext

Post on 07-Jan-2017

190 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Webinar: Transforming your Graph Analytics with GraphDB

27 October 2016

Page 2: Webinar: Transforming your Graph Analytics with GraphDB

Agenda

• The Semantic Web

• Reference Projects

• Resource Description Framework (RDF)

• RDF Schema

• Ontologies, OWL

• Semantic Databases

• GraphDB

• SPARQL

• Linked Open Data

#2

Page 3: Webinar: Transforming your Graph Analytics with GraphDB

Training Portfolio

#3

Introduction to Semantic Technologies

GraphDB

for beginners for developers for administrators •Overview of the Semantic Technologies landscape •Advantages to using RDF and triplestores •Ontologies for more meaningful data •Reasoning and inference •Querying with SPARQL •Linked Open Data

•Overview of the Semantic Technologies landscape •Using GraphDB to cut costs and increase revenue •Domain-specific applications and use cases of GraphDB •Exploring data using GraphDB •Gaining insights from SPARQL queries

•RDF and triplestores Querying with SPARQL •Making use of LOD •Immersive introduction to GraphDB functionality •components and architecture •rulesets and inference •GraphDB Connectors •Plugins and query modifiers

•GraphDB standard operability components and architecture •rulesets and inference •performance optimizations •Users and access rights •setup & maintenance •common caveats

Semantic use cases, solutions & applications

Text analytics and semantics with GATE

Eclipse RDF4J

•How semantic products & solutions can • cut costs and increase revenue along a

market vertical. • increase transparency and accessibility,

enrich own data and make use of LOD. •Calculate adoption time and cost; cost of ownership; revenue increase opportunities. •domain-specific solutions: reference architectures & key technical components

•Extract information from text •Index and query semantically annotated data with GATE Mimir •Train Machine Learning algorithms for Information Extraction •Setup and use GATE Cloud •Develop GATE applications •Work with GATE Embedded

•Overview of Semantic Technologies •processing and handling RDF data •repository configuration •programming with RDF4J •extending functionality through • triplestore storage (GraphDB) • free text search (LuceneSail) • geospatial search (GeoSPARQL) • meta-modeling (SPIN)

Page 4: Webinar: Transforming your Graph Analytics with GraphDB

Announcement

Free webinar: Integrating siloed structured and unstructured data with GraphDB™ 10 November 2016 | 11am EDT | 4pm BST | 6pm EEST

Topics covered:

• Installing GraphDB™ and configuring your repository

• Using simple ontologies for automated reasoning on data

• Transforming, cleaning up and linking your heterogeneous data with OntoRefine

• Loading all of your distributed data in one unified data layer

• Querying and updating your data with SPARQL

• Data visualization with GraphDB™

• More ways to make use of GraphDB™’s capabilities, example use cases

• Comparisson of GraphDB™ editions

• Overview of Ontotext’s training portfolio

#4

Page 5: Webinar: Transforming your Graph Analytics with GraphDB

THE SEMANTIC WEB

#5

Page 6: Webinar: Transforming your Graph Analytics with GraphDB

• “Semantic technologies” (ST) is a general term for any software that involves some kind and level of understanding the meaning of the information it deals with

• Examples: – A search engine retrieving a document mentioning “eagle” when queried for “bird”

– A database that returns Ivan when queried for “?x relativeOf Maria”, when the fact asserted was “Maria motherOf Ivan”

– A navigation system that is more intelligent than what we are already used to, e.g. asking it “take me to the nearest pizza place”.

• But, information on the Web is designed for consumption mostly by human end-users as they can naturally: – Recognize the meaning behind content and draw conclusions,

– Infer new knowledge using context and

– Understand background information

Semantic Technologies

#6

Page 7: Webinar: Transforming your Graph Analytics with GraphDB

The Web

• Billions of diverse documents online, but it is not easily possible to automatically:

• Retrieve relevant documents.

• Extract information.

• Combine information in a meaningful way.

• Idea of the Semantic Web: • Also publish machine processable data on the web.

• Formulate questions in terms understandable by a machine.

• Do this in a standardized way so machines can interoperate.

• The Web becomes a Web of Data, providing a common framework:

• To share knowledge on the Web across application boundaries

• To infer new relationships between pieces of data.

#7

Page 8: Webinar: Transforming your Graph Analytics with GraphDB

• Use big volumes of diverse structured data to enable better information discovery, exploration and analytics

• Better end-user experience: Know more! – Get more answers in less time

– Discover relationships by linking facts across different datasets and across domains

– Get better recommendations and exploration experience

• Better for enterprises: More efficient information management! – Integrate rich open data in your information architecture – more data with less effort

– Get more efficient in using commercial data sources and integrating them with proprietary data

– Better leverage for your data and content through dynamic and linked data publishing

Semantics add value

#8

Page 9: Webinar: Transforming your Graph Analytics with GraphDB

• Integration of deep and diverse data – Complex domain models

– Instance data reconciliation

• Development of enterprise knowledge platforms – Integration of data silos applications

– Establish enterprise-level data standards

• Content enrichment and retrieval based on deep data – Analyze unstructured textual information

– Recommend content based on semantic fingerprints

• Dynamic Semantic Publishing

• Adaptive eLearning technology

Sweet spot for semantic technology

#9

Page 10: Webinar: Transforming your Graph Analytics with GraphDB

REFERENCE PROJECTS

#10

Page 11: Webinar: Transforming your Graph Analytics with GraphDB

Profile • Mass media broadcaster founded in 1922 • 23,000 employees and over 5 billion

pounds in annual revenue.

Goals • Create a dynamic semantic publishing

platform that assembled web pages on-the-fly using a variety of data sources

• Deliver highly relevant data to web site visitors with sub-second response

Challenges • BBC journalists author and publish content

which is then statistically rendered. The costs and time to do this were high.

• Diverse content was difficult to navigate, content re-use was not flexible

• User experience needed to be improved with relevant content

"The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform."

John O’Donovan Chief Technical Architect

BBC

#11

Page 12: Webinar: Transforming your Graph Analytics with GraphDB

Future Media BBC MMXII

10 000+ Dynamic Aggregations

Page 13: Webinar: Transforming your Graph Analytics with GraphDB

Profile • Top 3 business media • Focused both on B2C publishing and B2B

services Goals

• Create a horizontal platform for both data and content based on semantics and serve all functionality through it

Challenges • Critical part of the entire workflow • Multiple development projects in parallel

with up to 2 months time between inception and go live

• GraphDB used not only for data, but for content storage as well

• Horizontal platform with focus on organizations, people, GPEs and relations between them

• Automatic extraction of all these concepts and relationships

• Separate stream of work for a user behavior based recommendation of relevant content and data across the entire media

Financial Times

#13

Page 14: Webinar: Transforming your Graph Analytics with GraphDB

Profile • Established in 1961 to enable federal

agencies • Specializes in logistics, financial,

infrastructure & information management

Goals • Unlock large collections of complex

documents • Improve analyst productivity • Create an application they can sell to US

Federal agencies

Challenges • Analysts taking hours to find, download

and search documents, using inaccurate keyword searches

• Needed a knowledge base to search quickly and guide the analysts – highly relevant searches

• Extracts knowledge from collection of documents

• Uses GraphDB to intuitively search and filter • Knowledge base used to suggest searches • Hyper speed performance • Huge savings in analyst time • Accurate results

LMI

#14

Page 15: Webinar: Transforming your Graph Analytics with GraphDB

Profile • Global, Bio-pharma company • $28 billion in sales in 2012 • $4 billion in R&D across three continents

Goals • Efficient design of new clinical studies • Quick access to all of the data • Improved evidence based decision-making • Strengthen the knowledge feedback loop • Enable predictive science Challenges • Over 7,000 studies and 23,000 documents

are difficult to obtain • Searches returning 1,000 – 10,000 results • Document repositories not designed for

reuse • Tedious process to arrive at evidence

based decisions

AstraZeneca

#15

Page 16: Webinar: Transforming your Graph Analytics with GraphDB

Profile • Euromoney Institutional Investor PLC, the

international online information and events group

Goals • Create a horizontal platform to serve 100

different publications • create a new publishing and information

platform which would include the latest authoring, storing, and display technologies including, semantic annotation, search and a triple store repository

Challenges • Different domains covered • Sophisticated content analytics incl.

Relation, template and scenario extraction

• Analytics of reports and news of various domains • Extraction of sophisticated macro economic views

on markets and market conditions; trades, condition and trade horizons, assets, asset allocations, etc.

• Multi-faceted search • Completely new content and data infrastructure

Euromoney

#16

Page 17: Webinar: Transforming your Graph Analytics with GraphDB

RESOURCE DESCRIPTION FRAMEWORK (RDF)

#17

Page 18: Webinar: Transforming your Graph Analytics with GraphDB

What is RDF and what is it for?

• Resource Description Framework (RDF)

– A general method for describing data

– By defining relationships between things

• Simple, yet flexible & powerful data model

– Easily merge data from multiple sources

– Even if the underlying schemas differ

• Built around existing Web standards

– XML

– URL (URI)

#18

Page 19: Webinar: Transforming your Graph Analytics with GraphDB

Resources, properties and literals

• Resources can be anything you want to describe

– Information resources can be found on the Web

– Non-information resources are anything else, e.g. people, organizations, places, things, events…

• Uniquely identified by a URI/IRI

– IRI is an internationalised URI

• Or can be identified by a blank node

– A unique anonymous value scoped to the current RDF document

#19

Page 20: Webinar: Transforming your Graph Analytics with GraphDB

Resources, properties and literals

• Properties are the relationships between resources

– e.g. X fatherOf Y (where X and Y are URIs)

– Or attributes X rdfs:label “X”

• RDF schemas can define the types of things that properties apply to

• Properties are always identified by a URI

#20

Page 21: Webinar: Transforming your Graph Analytics with GraphDB

Resources, properties and literals

• Literals are instances of datatypes

– e.g. string, integer, date

• Can have a language tag

– e.g. "Mass spectrometer"@EN

• Can have an XML schema datatype

– "1976-00-00T00:00:00Z"^^xsd:dateTime

• Can have no specific type, i.e. just a piece of text

– rdf:plainLiteral

#21

Page 22: Webinar: Transforming your Graph Analytics with GraphDB

RDF triples and graphs

• RDF Statements are formed of three parts Subject Object Predicate

This is the resource that the statement is about: URI or blank node

The property that relaties the subject and object: URI

Either a resource (URI or blank node) or a literal

• A collection of statements makes a directed graph

#22

Page 23: Webinar: Transforming your Graph Analytics with GraphDB

• How to model this kind of data?

• Missing values – who’s Pearl’s spouse? • Multiple values – merge them in one or add a new entry?

Relational DB to RDF: an Example

Person Spouse Child

Fred Wilma Pebbles

Wilma Fred Pebbles

Pearl -unknown- Wilma

Barney Betty Bamm-Bamm

Betty Barney Bamm-Bamm

Pebbles Bamm-Bamm Roxy, Chip

Bamm-Bamm Pebbles Roxy, Chip

#23

Page 24: Webinar: Transforming your Graph Analytics with GraphDB

Relational DB to RDF: an Example

#24

Page 25: Webinar: Transforming your Graph Analytics with GraphDB

Person

ID Name Gender

1 Betty F

2 Bamm-Bamm M

3 Barney M

Parent

ParID ChiID

1 2

Spouse

S1ID S2ID From To

1 3

Statement

Subject Predicate Object

:Human rdf:type rdfs:Class

:gender rdfs:type rdfs:Property

:hasChild rdfs:range :Human

:hasSpouse rdfs:range :Human

:Betty rdf:type :Human

:Betty rdf:label “Betty”

:Betty :gender “F”

:Bamm-Bamm rdf:label “Bamm-Bamm”

:Bamm-Bamm :gender “M”

:Betty :hasChild :Bamm-Bamm

:Betty :hasSpouse :Barney

Relational DB to RDF: an Example

#25

Page 26: Webinar: Transforming your Graph Analytics with GraphDB

Semantic Data Integration

• A modern way to integrate highly heterogeneous data – Has emerged as the most promising approach in the last decade

• Based on 3 assumptions: – Everyone uses RDF

Solution: R2RML for RDB, TARQL for CSV, …

– Everyone uses consistent ontologies Solution: semantic mapping e.g. owl:equivalentClass, owl:equivalentProperty

– Everyone uses the same URIs: Solution: owl:sameAs, skos:exactMatch

#26

Page 27: Webinar: Transforming your Graph Analytics with GraphDB

Merging RDF Data

• There are no restrictions on merging RDF graphs

• The same URI from different graphs is assumed to identify the same resource

• If a URI is used in multiple graphs then its description is a combination of all properties in all graphs

– i.e. a simple combination of graphs

• This is an enabler for Linked Open Data

– Where different organizations make statements about the same resources

#27

Page 28: Webinar: Transforming your Graph Analytics with GraphDB

Named Graphs (Contexts)

• Triples can belong to named graphs (also a URI)

• Usually modeled in software as quads

– <Subject, Predicate, Object, Context>

– A statement is not required to have a graph (it then belongs to the default graph)

• Named graphs allow subsets of statements to be handled separately

– e.g. deleting all statements in a named graph

• All modern semantic repositories are quadstores (or bigger)

#28

Page 29: Webinar: Transforming your Graph Analytics with GraphDB

Syntaxes

• The abstract structure of RDF is a collection of statements (triples)

• This can be written down in many ways

– RDF/XML

<?xml version="1.0" encoding="utf-8" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns0="http://example.org/elements#"> <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Helium"> <ns0:atomicNumber rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">2</ns0:atomicNumber> <ns0:atomicMass rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">4.002602</ns0:atomicMass> <ns0:specificGravity rdf:datatype="http://www.w3.org/2001/XMLSchema#double">1.663E-4</ns0:specificGravity> </rdf:Description> </rdf:RDF>

#29

Page 30: Webinar: Transforming your Graph Analytics with GraphDB

Syntaxes

• The abstract structure of RDF is a collection of statements (triples)

• This can be written down in many ways

– RDF/XML

– Turtle

@prefix : <http://example.org/elements#> . <http://en.wikipedia.org/wiki/Helium> :atomicNumber 2 ; :atomicMass 4.002602 ; :specificGravity 1.663E-4 .

#30

Page 31: Webinar: Transforming your Graph Analytics with GraphDB

Syntaxes

• The abstract structure of RDF is a collection of statements (triples)

• This can be written down in many ways

– RDF/XML

– Turtle

– JSON-LD

– …

[{"@id":"http://en.wikipedia.org/wiki/Helium","http://example.org/elements#atomicNumber":[{"@value":2}],"http://example.org/elements#atomicMass":[{"@value":"4.002602","@type":"http://www.w3.org/2001/XMLSchema#decimal"}],"http://example.org/elements#specificGravity":[{"@value":0.0001663}]}]

#31

Page 32: Webinar: Transforming your Graph Analytics with GraphDB

RDF SCHEMA (RDFS)

#32

Page 33: Webinar: Transforming your Graph Analytics with GraphDB

What is RDF Schema?

• RDFS provides means for

– Defining Classes and Properties

– Defining hierarchies (of classes and properties)

– Defining domain and range of properties

• RDFS differs from XML Schema (XSD)

– Open World Assumption vs. Closed World Assumption

– RDFS is about describing resources, not about validation

• Entailment rules (axioms)

– Infer new triples from existing ones

#33

Page 34: Webinar: Transforming your Graph Analytics with GraphDB

RDFS entailment rules

• Class/Property hierarchies

• Inferring types (domain/range restrictions)

:Fred a :Man . :Fred a :Human . :Fred a :Mammal .

:Human rdfs:subClassOf :Mammal . :Man rdfs:subClassOf :Human . :Man rdfs:subClassOf :Mammal .

:hasSpouse rdfs:subPropertyOf :relatedTo . :Fred :hasSpouse :Wilma . :Fred :relatedTo :Wilma .

:hasSpouse rdfs:domain :Human ; rdfs:range :Human . :Barney :hasSpouse :Betty . :Barney a :Human . :Betty a :Human .

#34

Page 35: Webinar: Transforming your Graph Analytics with GraphDB

ONTOLOGIES, OWL

#35

Page 36: Webinar: Transforming your Graph Analytics with GraphDB

What is an ontology?

• Different formal specifications provide sharable and reusable knowledge representation

– Examples – taxonomies, thesauri, topic maps, …

• An ontology specification additionally includes

– Description of the classes in some domain and their properties

– Description of the possible relationships between classes and the constraints on how the relationships can be used

– Sometimes, the individuals (members of classes)

#36

Page 37: Webinar: Transforming your Graph Analytics with GraphDB

Web Ontology Language (OWL)

• More expressive than RDFS

– Identity equivalence/difference • sameAs, differentFrom

• More expressive class definitions

– Class intersection, union, complement, disjointness

– Cardinality restrictions

• More expressive property definitions

– Object/Datatype properties

– Transitive, functional, symmetric, inverse properties

– Value restrictions

#37

Page 38: Webinar: Transforming your Graph Analytics with GraphDB

Web Ontology Language (OWL)

• What can be done with OWL?

– Consistency checks – are there contradictions in the logical model?

– Satisfiability checks – are there classes that cannot have any instances?

– Classification – what is the type of a particular instance?

#38

Page 39: Webinar: Transforming your Graph Analytics with GraphDB

39

An OWL class is defined by the OWL term owl:Class OWL classes can also be subclassed as in RDFS :PetDinosaur rdfs:subClassOf :Dinosaur‏

Class Construction

Pet Dinosaur

Dinosaur

#39

Page 40: Webinar: Transforming your Graph Analytics with GraphDB

40

intersectionOf(Pet Dinosaur)‏

PetDinosaur

unionOf(WorkingDinosaur PetDinosaur)‏

Dinosaur

These can be combined to make more complex constructions:

intersectionOf( complementOf(Pet)‏ Dinosaur)

WorkingOnlyDinosaur

Class Construction (2)

Dinosaur

Dinosaur

Pet

Pet

Working Dinosaur

Pet Dinosaur

#40

Page 41: Webinar: Transforming your Graph Analytics with GraphDB

41

OWL OneOf is a class construct that allows a class to be completely defined from a list of named individuals. We say that these are the complete extension of this class, i.e. represent all the instances which may belong to the class. e.g. The class of directors of The Flintstones oneOf(:BrianLevrock :BrianLevant)‏

Class Construction (3)

Brian Levrock

Brian Levant

“The Flintstones” Directors

#41

Page 42: Webinar: Transforming your Graph Analytics with GraphDB

Equivalence & Disjointness

• Of properties :hasSpouse owl:equivalentProperty :marriedTo

:hasSpouse owl:propertyDisjointWith :hasChild

• Of classes :Human owl:equivalentClass foaf:Person

:Man owl:disjointWith :Woman

• Of individuals (instances of classes) :JohnGoodman ^:playedBy :Fred; owl:sameAs linkedmdb:actor/31379

:PrehistoricAmerica owl:differentFrom dbr:Americas

#42

Page 43: Webinar: Transforming your Graph Analytics with GraphDB

43

A cardinality is a specification of how many different values can be given to a property or an individual of a particular class. • Exact value: e.g. A married person can have exactly 1 spouse:

• Maximum value: e.g. A person can have at most 2 biological parents:

• Minimum value: e.g. To be a parent a person has to have at least one child:

Cardinalities

:MarriedPerson rdf:type owl:Class . _:bn1 a owl:Restriction; owl:onProperty :hasSpouse; owl:cardinality “1” . :MarriedPerson rdfs:subClassOf _:bn1 .

:Human rdf:type owl:Class . :hasBioParent rdfs:subPropertyOf :hasParent . _:bn2 a owl:Restriction; owl:onProperty :hasBioParent; owl:maxCardinality “2” . :Human rdfs:subClassOf _:bn2 .

:Parent rdf:type owl:Class . _:bn3 a owl:Restriction; owl:onProperty :hasChild; owl:minCardinality “1” . :Parent rdfs:subClassOf _:bn3 .

#43

Page 44: Webinar: Transforming your Graph Analytics with GraphDB

44

OWL introduces property characteristics for more expressivity in inferrencing about instances and their properties Transitivity :Bedrock :partOf :CobblestoneCounty :CobblestoneCounty :partOf :PrehistoricAmerica :partOf a owl:TransitiveProperty :Bedrock :partOf :PrehistoricAmerica

Symmetry‏‏:Fred :hasSpouse :Wilma :hasSpouse a owl:symmetricProperty :Wilma :hasSpouse :Fred

Property Axioms

#44

Page 45: Webinar: Transforming your Graph Analytics with GraphDB

45

Functional :Wilma :hasSpouse :Fred :Wilma :hasSpouse :MrFlintstone :hasSpouse a owl:functionalProperty :Fred owl:sameAs :MrFlintstone

Inverse ‏:Fred :hasChild :Pebbles. :Wilma :hasChild :Pebbles. :hasParent owl:inverseOf :hasChild. :Pebbles :hasParent :Fred, :Wilma.

Property Axioms

#45

Page 46: Webinar: Transforming your Graph Analytics with GraphDB

The property axiom InverseFunctional is useful for specifying unique properties identifying an individual e.g. Every person can be a spouse of exactly one person :Wilma :hasSpouse :Fred :MrsFlintstone :hasSpouse :Fred :hasSpouse a owl:inverseFunctionalProperty :Wilma owl:sameAs :MrsFlintstone

46

Property Axioms

#46

Page 47: Webinar: Transforming your Graph Analytics with GraphDB

OWL sublanguages

– OWL Lite – low expressiveness / low computational complexity

– OWL 2 EL: Limited to basic classification, but with polynomial-time reasoning

– OWL 2 QL: Designed to be translatable to relational database querying

– OWL 2 RL: Designed to be efficiently implementable in rule-based systems

– OWL DL – high expressiveness / decidable & complete

– OWL Full – max expressiveness / no guarantees

More restrictive than OWL DL

#47

Page 48: Webinar: Transforming your Graph Analytics with GraphDB

SEMANTIC DATABASES

#48

Page 49: Webinar: Transforming your Graph Analytics with GraphDB

• Efficient indexing of RDF statements – Maintain predicate-object-subject and predicate-subject-objectSupport

transactions and isolation

– Atomicity, consistency, isolation and durability of write and read operations

• Exposes a SPARQL endpoint – Query data from anywhere

• Reasoning or consistency checking – Infer new facts

Why use a Semantic Database?

#49

Page 50: Webinar: Transforming your Graph Analytics with GraphDB

• Standard compliance – Unlike most of the NoSQL and graph databases

– Based on a mature set of W3C standards: RDF, RDFS, OWL, SPARQL

• Flexible Schema – Unlike SQL databases

– RDF facilitates dealing with multiple schemata and schema evolution

• Allow for complex queries – Unlike the typical NoSQL databases

– SPARQL allows for comprehensive queries, similar to SQL

– Allows for queries that are not possible in SQL (unknown relation types)

• Linked Data Ready – RDF is the standard for linked data publication

How are RDF databases different?

#50

Page 51: Webinar: Transforming your Graph Analytics with GraphDB

GRAPHDB

#51

Page 52: Webinar: Transforming your Graph Analytics with GraphDB

GraphDB™ Editions

• GraphDB™ Free

• GraphDB™ Standard

• GraphDB™ Cloud

• GraphDB™ as-a-Service (S4)

• GraphDB™ Enterprise

#52

Page 54: Webinar: Transforming your Graph Analytics with GraphDB

To install GraphDB™ Free Edition, perform these steps:

• on Windows: run the installer and it starts automatically

• Otherwise: unzip, execute the startup script located in the root directory to start the GraphDB and Workbench interfaces :

startup.bat (Windows)

./startup.sh (Linux/Unix/Mac OS)

The message below appears in your Terminal and the GraphDB Workbench opens up at http://localhost:7200/.

INFO: Starting ProtocolHandler [“http-bio-7200”]

Opening web app in default browser

GraphDB™ Free Edition Installation Overview

#54

Page 55: Webinar: Transforming your Graph Analytics with GraphDB

Create a new repository by:

• Launching the GraphDB™ Workbench

• Selecting “Admin”

• Selecting “Locations and Repositories”

• Configuring the new repository

GraphDB™ Free Edition Workbench New Repository

http://localhost:7200

#55

Page 56: Webinar: Transforming your Graph Analytics with GraphDB

Manage your repositories

Change the repository from the dropdown menu in the top right corner.

#56

Page 57: Webinar: Transforming your Graph Analytics with GraphDB

Load your data

Many options:

• Through the GraphDB Workbench – Load from local files

– Load from server files

– Load remote content

– Manually enter data in the text area

• Through SPARQL or RDF4J (Sesame) API

• Through the GraphDB LoadRDF tool – A low level bulk load tool, which writes directly in the database index

structures. It is ultra fast and supports parallel inference.

– Can be performed only if the repository is empty (great for the initial loading)

#57

Page 58: Webinar: Transforming your Graph Analytics with GraphDB

Loading Data

Supported File Formats

#58

Page 59: Webinar: Transforming your Graph Analytics with GraphDB

Load your data

Today: Load data from local files through the GraphDB Workbench.

1. Go to Data -> Import.

2. Open the Local files tab and click the Select files icon

#59

Page 60: Webinar: Transforming your Graph Analytics with GraphDB

Explore your data

#60

Page 61: Webinar: Transforming your Graph Analytics with GraphDB

Test the repository by

• Selecting “SPARQL”

• Submitting queries

GraphDB™ Workbench Execute Queries

2 Query 1 Insert Data

http://localhost:7200

#61

Page 62: Webinar: Transforming your Graph Analytics with GraphDB

Query monitoring and interruption

To track and interrupt long running queries, go to Admin -> Query monitoring.

To interrupt long running queries, click the Abort query button.

#62

Page 63: Webinar: Transforming your Graph Analytics with GraphDB

Ontotext GraphDB Connectors

• Provides extremely fast full text search, range, faceted search, and aggregations

• Utilize an external engine like Lucene, Solr or Elasticsearch

• Flexible schema mapping: index only what you need

• Real-time synchronization of data in GraphDB and the external engine

• Connector management via SPARQL

• Data querying & update via SPARQL

• Based on the GraphDB plug-in architecture

#63

Page 64: Webinar: Transforming your Graph Analytics with GraphDB

Connectors – Primary Features

•Snippet extraction: highlighting of search terms in the search result

•Faceted search

– e.g. Europeana Food and Drink

•Sorting by any preconfigured field

•Paging of results using offset and limit

•Custom mapping of RDF types to Lucene types

•Specifying which Lucene analyzer to use (the default is Lucene's StandardAnalyzer)

•Weighting an entity by [numeric] value of one or more predicates

•Custom scoring expressions at query time to evaluate score based on Lucene

#64

Page 65: Webinar: Transforming your Graph Analytics with GraphDB

And many more features …

• Blueprints (Apache TinkerPop, aka Gremlin) support – use graph programming frameworks or graph exploration software

• RDF Rank – identify “important” nodes in an RDF graph based on their interconnectedness

• GeoSPARQL support – represent and query geospatial linked data

#65

Page 66: Webinar: Transforming your Graph Analytics with GraphDB

SPARQL

#66

Page 67: Webinar: Transforming your Graph Analytics with GraphDB

What is SPARQL?

• SQL-like query language for RDF data

• 4 query types:

– Ask, Select, Construct, Describe

• Query extensions:

– Aggregates, Subqueries, Negation, Filters, Optional patterns, …

• Data management updates:

– Insert data, Delete data, Delete/Insert

• Graph management updates:

– Create, Load, Clear, Drop, Copy, Move, Add

#67

Page 68: Webinar: Transforming your Graph Analytics with GraphDB

What is a SPARQL query?

Main idea: Pattern matching

• Queries describe sub-graphs of the queried graph

• Graph patterns are RDF graphs specified in Turtle syntax, which contain variables (prefixed by either “?” or “$”)

• Sub-graphs that match the graph patterns yield a result

?child :Pebbles :hasChild

:Pebbles :hasChild :Roxy

:Pebbles :hasChild :Chip

#68

Page 69: Webinar: Transforming your Graph Analytics with GraphDB

SPARQL query types - ASK

• ASK – test whether a query patterns has a solution ASK WHERE {?parent :hasChild ?child}

• Returns: YES

#69

Page 70: Webinar: Transforming your Graph Analytics with GraphDB

SPARQL query types - SELECT

• SELECT – returns variables & their bindings SELECT ?parent ?child WHERE {?parent :hasChild ?child}

?parent ?child

:Pearl :Wilma

:Wilma :Pebbles

:Fred :Pebbles

:Barney :Bamm-Bamm

:Betty :Bamm-Bamm

:Pebbles :Roxy

:Pebbles :Chip

:Bamm-Bamm :Roxy

:Bamm-Bamm :Chip

#70

Page 71: Webinar: Transforming your Graph Analytics with GraphDB

Namespace definitions

Query form + variables Data sources

Query patterns & filters

Solution modifiers

Components of a SPARQL query

PREFIX : <http://www.example.org/bedrock#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX dbr: <http://www.dbpedia.org/resource/> PREFIX foaf: <http://xmlns.com/foaf/> SELECT ?grandParent ?grandChild FROM <http://www.example.org/bedrock#> WHERE { ?grandParent :hasChild ?parent . ?parent :hasChild ?grandChild. } ORDER BY (?grandChild)

(Output will be pairs of grandparent and grandchild URIs ordered alphabetically for the grandchild URIs)

#71

Page 72: Webinar: Transforming your Graph Analytics with GraphDB

Graph patterns

• Basic graph patterns

– A conjunction of triple patterns

• Optional graph pattern

– Specifies optional parts of a pattern (similar to an “outer join” in SQL)

• Union graph patterns

– Specifies disjunctions (alternatives)

#72

Page 73: Webinar: Transforming your Graph Analytics with GraphDB

• Find all pairs of children and parents and include the parent’s workplace if it’s specified in the data.

PREFIX : <http://www.example.org/bedrock#>

SELECT ?parent ?child ?company

WHERE {

?parent :hasChild ?child.

OPTIONAL {?parent :worksFor ?company}

}

Optional graph pattern

?parent ?child ?company

:Pearl :Wilma

:Wilma :Pebbles

:Fred :Pebbles :RockQuarry

:Barney :Bamm-Bamm :RockQuarry

:Betty :Bamm-Bamm

:Pebbles :Roxy

:Pebbles :Chip

:Bamm-Bamm :Roxy

:Bamm-Bamm :Chip

#73

Page 74: Webinar: Transforming your Graph Analytics with GraphDB

Union graph pattern

• Find children of either Fred or Barney and return pairs of those children with each of their parents.

PREFIX : <http://www.example.org/bedrock#>

SELECT ?parent ?child

WHERE {

{:Fred :hasChild ?child}

UNION

{:Barney :hasChild ?child}

?parent :hasChild ?child

}

?parent ?child

:Wilma :Pebbles

:Fred :Pebbles

:Barney :Bamm-Bamm

:Betty :Bamm-Bamm

#74

Page 75: Webinar: Transforming your Graph Analytics with GraphDB

Order By modifier

• Let’s order those alphabetically for the parent URIs.

PREFIX : <http://www.example.org/bedrock#>

SELECT ?parent ?child

WHERE {

{:Fred :hasChild ?child}

UNION

{:Barney :hasChild ?child}

?parent :hasChild ?child

}

ORDER BY (?parent)

?parent ?child

:Barney :Bamm-Bamm

:Betty :Bamm-Bamm

:Fred :Pebbles

:Wilma :Pebbles

#75

Page 76: Webinar: Transforming your Graph Analytics with GraphDB

Filtering solutions

• Find people who are over 30 years of age.

PREFIX : <http://www.example.org/bedrock#>

PREFIX foaf: <http://xmlns.com/foaf/>

SELECT ?person ?age

WHERE {

?person foaf:age ?age .

FILTER (?age > 30).

} ORDER BY (?age)

?person ?age

:Fred 44

:Barney 45

Statement

Subject Predicate Object

:Fred foaf:age "44"^^xsd:integer

:Barney foaf:age "45"^^xsd:integer

:Chip foaf:age "1"^^xsd:integer

:Bamm-Bamm foaf:age "22"^^xsd:integer

#76

Page 77: Webinar: Transforming your Graph Analytics with GraphDB

Aggregates

• Aggregates allow computation of values using: – COUNT, SUM, MIN, MAX, AVG, etc.

• Built around the GROUP BY operator

• For example computing popularity in a social graph: SELECT ?person (COUNT(?someone) AS ?popularity)

WHERE {?someone foaf:knows ?person}

GROUP BY ?person

• Prune at group level (cf. FILTER) using HAVING, e.g.: GROUP BY ?person HAVING (COUNT(?someone) > 4)

#77

Page 78: Webinar: Transforming your Graph Analytics with GraphDB

Expressions in SELECT clauses

• SPARQL 1.1 allows functions use with variables in the head of the query

• For example, to glue together names of spouses:

SELECT (CONCAT(?wifeName + " and " + ?husbandName + " " + ?husbandSurname) AS ?familyName) WHERE { ?wife a :female; foaf:firstName ?wifeName; :hasSpouse ?husband. ?husband a :male; foaf:firstName ?husbandName; foaf:familyName ?husbandSurname }

?familyName

"Wilma and Fred Flintstone"

“Betty and Barney Rubble"

“Pebbles and Bamm-Bamm Rubble"

#78

Page 79: Webinar: Transforming your Graph Analytics with GraphDB

Property Paths

• SPARQL 1.0 builds graph patterns from triple patterns, where resources are separated in the graph by one arc

• SPARQL 1.1 generalizes on triple patterns to model resources separated by paths of arbitrary length

• e.g. Get all ancestors regardless how many links away SELECT ?ancestor WHERE {?person :hasParent+/foaf:name ?ancestor}

• e.g. Get all ancestors and oneself SELECT ?ancestor WHERE {?person :hasParent*/foaf:name ?ancestor}

#79

Page 80: Webinar: Transforming your Graph Analytics with GraphDB

SPARQL 1.1 Data Management

• 3 ways to modify data within a graph INSERT DATA {

:fred foaf:name "Freddy Flintstone". :fred foaf:firstName "Freddy" } DELETE DATA { :fred foaf:name "Fred Flintstone". :fred foaf:firstName "Fred" } DELETE {?person foaf:name "Freddy Flintstone"; foaf:firstName "Freddy".} INSERT {?person foaf:name "Fred Flintstone"; foaf:firstName "Fred".} WHERE {?person foaf:name "Freddy Flintstone"; foaf:firstName "Freddy".}

• 2 ways to further change the data within a graph: LOAD <http://www.example.org/bedrock/> INTO GRAPH <http://ontotext.com/bedrock#>

CLEAR GRAPH <http://ontotext.com/bedrock#>

#80

Page 81: Webinar: Transforming your Graph Analytics with GraphDB

SPARQL 1.1 Graph Management

• A new named graph can be explicitly created

CREATE GRAPH <http://ontotext.com/bedrock#>

• … or dropped

DROP GRAPH <http://ontotext.com/bedrock#>

• … but also

COPY DEFAULT to <http://ontotext.com/bedrock#> MOVE DEFAULT to <http://ontotext.com/bedrock#> ADD DEFAULT to <http://ontotext.com/bedrock#>

#81

Page 82: Webinar: Transforming your Graph Analytics with GraphDB

LINKED OPEN DATA

#82

Page 83: Webinar: Transforming your Graph Analytics with GraphDB

What is Linked Data?

• “To make the Semantic Web a reality, it is necessary to have a large volume of data available on the Web in a standard, reachable and manageable format. In addition the relationships among data also need to be made available. This collection of interrelated data on the Web can also be referred to as Linked Data. Linked Data lies at the heart of the Semantic Web.” (W3C)

• Linked Data is a set of simple principles that allows publishing, querying and browsing of RDF data, distributed across different servers

#83

Page 84: Webinar: Transforming your Graph Analytics with GraphDB

Linked Data design principles

1. Unambiguous identifiers for data resources

– “Use URIs as names for things.”

2. Use the structure of the web

– “Use HTTP URIs so that people can look up the names.”

3. Make it easy to discover information about resources

– “When someone lookups a URI, provide useful information, using the standards (RDF, SPARQL).”

4. Link the data resource to related resources

– “Include links to other URIs, so that users can discover more things.”

#84

Page 85: Webinar: Transforming your Graph Analytics with GraphDB

Linked Data design principles

3. When someone lookups a URI, provide useful information, using the standards (RDF, SPARQL)

What to return for a URI?

• Immediate description: triples where the URI is the subject.

• Backlinks: triples where the URI is the object.

• Related descriptions: information of interest in typical usage scenarios.

• Metadata: information as author and licensing information.

• Syntax: RDF descriptions as RDF/XML and human-readable formats Source: How to Publish Linked Data on The Web - Chris Bizer, Richard Cyganiak, Tom Heath.

#85

Page 86: Webinar: Transforming your Graph Analytics with GraphDB

4. Include links to other URIs, so that users can discover more things

There are several ways to reuse URIs:

• direct reuse

• (OWL) sameAs

• (SKOS) exactMatch, closeMatch

• (RDFS) seeAlso

• direct reuse of class/property

• (RDFS) sub-class/-property

• (OWL) equivalent class/property

• (SKOS) broadMatch

Linked Data design principles

Instance Level

Schema Level

#86

Page 87: Webinar: Transforming your Graph Analytics with GraphDB

Linked Data 5 Star

Data is available on the Web.

Data is available as machine-readable structured data.

Non-proprietary formats are used.

Individual data identified with open standards.

Data is linked to other data providers.

#87

Page 88: Webinar: Transforming your Graph Analytics with GraphDB

Linked Data evolution (2007)

(c) R. Cyganiak & A. Jentzsch

#88

Page 89: Webinar: Transforming your Graph Analytics with GraphDB

Linked Data evolution (2008)

(c) R. Cyganiak & A. Jentzsch

#89

Page 90: Webinar: Transforming your Graph Analytics with GraphDB

Linked Data evolution (2009)

(c) R. Cyganiak & A. Jentzsch

#90

Page 91: Webinar: Transforming your Graph Analytics with GraphDB

Linked Data evolution (2010)

(c) R. Cyganiak & A. Jentzsch

#91

Page 92: Webinar: Transforming your Graph Analytics with GraphDB

Linked Data evolution (2011)

(c) R. Cyganiak & A. Jentzsch

#92

Page 93: Webinar: Transforming your Graph Analytics with GraphDB

Linked Data evolution (2014)

(c) R. Cyganiak & A. Jentzsch

#93

Page 94: Webinar: Transforming your Graph Analytics with GraphDB

State of LOD

(c) Bizer, Cyganiak & Jentzsch

Number of triples

Number of out-links

#94

Page 95: Webinar: Transforming your Graph Analytics with GraphDB

COMMONLY USED LOD DATASETS

#95

Page 96: Webinar: Transforming your Graph Analytics with GraphDB

GeoNames

• The GeoNames geographical database covers all countries and contains over 11M placenames that are available for download free of charge.

#96

Page 97: Webinar: Transforming your Graph Analytics with GraphDB

VIAF

• 20 National Libraries and 15 other contributors, 35M persons, organizations, places, conferences

#97

Page 98: Webinar: Transforming your Graph Analytics with GraphDB

Wikidata

• Provides structured data to Wikipedias.

• Over 19M entities

#98

Page 99: Webinar: Transforming your Graph Analytics with GraphDB

And many more …

• DBpedia: extracts structured data from Wikipedias.

– 4.5M en, 3M de, …

– (Wikidata: provides structured data to Wikipedias)

• Freebase: basis of Google Knowledge Graph, phasing out

• Data from Schema.org marked up websites

• Europeana: over 50M records from museums, libraries, archives and multi-media collections (Ontotext hosts the EDM SPARQL repository)

• OpenTED: EU Tender Electronic Daily

• LinkedMDB: movies

• BBC Sports, Wildlife, Music, Programmes (started by Ontotext)

#99

Page 100: Webinar: Transforming your Graph Analytics with GraphDB

Support and FAQ’s

[email protected]

Additional resources: Ontotext: Community Forum and Evaluation Support: http://stackoverflow.com/questions/tagged/graphdb GraphDB Website and Documentation: http://graphdb.ontotext.com Whitepapers, Fundamentals: http://ontotext.com/knowledge-hub/fundamentals/ SPARQL, OWL, and RDF: RDF: http://www.w3.org/TR/rdf11-concepts/ RDFS: http://www.w3.org/TR/rdf-schema/ SPARQL Overview: http://www.w3.org/TR/sparql11-overview/ SPARQL Query: http://www.w3.org/TR/sparql11-query/ SPARQL Update: http://www.w3.org/TR/sparql11-update

#100

Page 101: Webinar: Transforming your Graph Analytics with GraphDB

For Further Information

• Georgi Georgiev, Head of Global Alliances Development

[email protected]

– 359.882.885.636

• Ilian Uzunov, Europe Sales and Business Development

[email protected]

– 359.888.772.248

• Peio Popov, North America Sales and Business Development

[email protected]

– 1.929.239.0659

#101

Page 102: Webinar: Transforming your Graph Analytics with GraphDB

Training Portfolio

#102

Introduction to Semantic Technologies

GraphDB

for beginners for developers for administrators •Overview of the Semantic Technologies landscape •Advantages to using RDF and triplestores •Ontologies for more meaningful data •Reasoning and inference •Querying with SPARQL •Linked Open Data

•Overview of the Semantic Technologies landscape •Using GraphDB to cut costs and increase revenue •Domain-specific applications and use cases of GraphDB •Exploring data using GraphDB •Gaining insights from SPARQL queries

•RDF and triplestores Querying with SPARQL •Making use of LOD •Immersive introduction to GraphDB functionality •components and architecture •rulesets and inference •GraphDB Connectors •Plugins and query modifiers

•GraphDB standard operability components and architecture •rulesets and inference •performance optimizations •Users and access rights •setup & maintenance •common caveats

Semantic use cases, solutions & applications

Text analytics and semantics with GATE

Eclipse RDF4J

•How semantic products & solutions can • cut costs and increase revenue along a

market vertical. • increase transparency and accessibility,

enrich own data and make use of LOD. •Calculate adoption time and cost; cost of ownership; revenue increase opportunities. •domain-specific solutions: reference architectures & key technical components

•Extract information from text •Index and query semantically annotated data with GATE Mimir •Train Machine Learning algorithms for Information Extraction •Setup and use GATE Cloud •Develop GATE applications •Work with GATE Embedded

•Overview of Semantic Technologies •processing and handling RDF data •repository configuration •programming with RDF4J •extending functionality through • triplestore storage (GraphDB) • free text search (LuceneSail) • geospatial search (GeoSPARQL) • meta-modeling (SPIN)

Page 103: Webinar: Transforming your Graph Analytics with GraphDB

Announcement

Free webinar: Integrating siloed structured and unstructured data with GraphDB™ 10 November 2016 | 11am EDT | 4pm BST | 6pm EEST

Topics covered:

• Installing GraphDB™ and configuring your repository

• Using simple ontologies for automated reasoning on data

• Transforming, cleaning up and linking your heterogeneous data with OntoRefine

• Loading all of your distributed data in one unified data layer

• Querying and updating your data with SPARQL

• Data visualization with GraphDB™

• More ways to make use of GraphDB™’s capabilities, example use cases

• Comparisson of GraphDB™ editions

• Overview of Ontotext’s training portfolio

#103

Page 104: Webinar: Transforming your Graph Analytics with GraphDB

The End