webinar: transforming your graph analytics with graphdb
TRANSCRIPT
27 October 2016
Agenda
• The Semantic Web
• Reference Projects
• Resource Description Framework (RDF)
• RDF Schema
• Ontologies, OWL
• Semantic Databases
• GraphDB
• SPARQL
• Linked Open Data
#2
Training Portfolio
#3
Introduction to Semantic Technologies
GraphDB
for beginners for developers for administrators •Overview of the Semantic Technologies landscape •Advantages to using RDF and triplestores •Ontologies for more meaningful data •Reasoning and inference •Querying with SPARQL •Linked Open Data
•Overview of the Semantic Technologies landscape •Using GraphDB to cut costs and increase revenue •Domain-specific applications and use cases of GraphDB •Exploring data using GraphDB •Gaining insights from SPARQL queries
•RDF and triplestores Querying with SPARQL •Making use of LOD •Immersive introduction to GraphDB functionality •components and architecture •rulesets and inference •GraphDB Connectors •Plugins and query modifiers
•GraphDB standard operability components and architecture •rulesets and inference •performance optimizations •Users and access rights •setup & maintenance •common caveats
Semantic use cases, solutions & applications
Text analytics and semantics with GATE
Eclipse RDF4J
•How semantic products & solutions can • cut costs and increase revenue along a
market vertical. • increase transparency and accessibility,
enrich own data and make use of LOD. •Calculate adoption time and cost; cost of ownership; revenue increase opportunities. •domain-specific solutions: reference architectures & key technical components
•Extract information from text •Index and query semantically annotated data with GATE Mimir •Train Machine Learning algorithms for Information Extraction •Setup and use GATE Cloud •Develop GATE applications •Work with GATE Embedded
•Overview of Semantic Technologies •processing and handling RDF data •repository configuration •programming with RDF4J •extending functionality through • triplestore storage (GraphDB) • free text search (LuceneSail) • geospatial search (GeoSPARQL) • meta-modeling (SPIN)
Announcement
Free webinar: Integrating siloed structured and unstructured data with GraphDB™ 10 November 2016 | 11am EDT | 4pm BST | 6pm EEST
Topics covered:
• Installing GraphDB™ and configuring your repository
• Using simple ontologies for automated reasoning on data
• Transforming, cleaning up and linking your heterogeneous data with OntoRefine
• Loading all of your distributed data in one unified data layer
• Querying and updating your data with SPARQL
• Data visualization with GraphDB™
• More ways to make use of GraphDB™’s capabilities, example use cases
• Comparisson of GraphDB™ editions
• Overview of Ontotext’s training portfolio
#4
THE SEMANTIC WEB
#5
• “Semantic technologies” (ST) is a general term for any software that involves some kind and level of understanding the meaning of the information it deals with
• Examples: – A search engine retrieving a document mentioning “eagle” when queried for “bird”
– A database that returns Ivan when queried for “?x relativeOf Maria”, when the fact asserted was “Maria motherOf Ivan”
– A navigation system that is more intelligent than what we are already used to, e.g. asking it “take me to the nearest pizza place”.
• But, information on the Web is designed for consumption mostly by human end-users as they can naturally: – Recognize the meaning behind content and draw conclusions,
– Infer new knowledge using context and
– Understand background information
Semantic Technologies
#6
The Web
• Billions of diverse documents online, but it is not easily possible to automatically:
• Retrieve relevant documents.
• Extract information.
• Combine information in a meaningful way.
• Idea of the Semantic Web: • Also publish machine processable data on the web.
• Formulate questions in terms understandable by a machine.
• Do this in a standardized way so machines can interoperate.
• The Web becomes a Web of Data, providing a common framework:
• To share knowledge on the Web across application boundaries
• To infer new relationships between pieces of data.
#7
• Use big volumes of diverse structured data to enable better information discovery, exploration and analytics
• Better end-user experience: Know more! – Get more answers in less time
– Discover relationships by linking facts across different datasets and across domains
– Get better recommendations and exploration experience
• Better for enterprises: More efficient information management! – Integrate rich open data in your information architecture – more data with less effort
– Get more efficient in using commercial data sources and integrating them with proprietary data
– Better leverage for your data and content through dynamic and linked data publishing
Semantics add value
#8
• Integration of deep and diverse data – Complex domain models
– Instance data reconciliation
• Development of enterprise knowledge platforms – Integration of data silos applications
– Establish enterprise-level data standards
• Content enrichment and retrieval based on deep data – Analyze unstructured textual information
– Recommend content based on semantic fingerprints
• Dynamic Semantic Publishing
• Adaptive eLearning technology
Sweet spot for semantic technology
#9
REFERENCE PROJECTS
#10
Profile • Mass media broadcaster founded in 1922 • 23,000 employees and over 5 billion
pounds in annual revenue.
Goals • Create a dynamic semantic publishing
platform that assembled web pages on-the-fly using a variety of data sources
• Deliver highly relevant data to web site visitors with sub-second response
Challenges • BBC journalists author and publish content
which is then statistically rendered. The costs and time to do this were high.
• Diverse content was difficult to navigate, content re-use was not flexible
• User experience needed to be improved with relevant content
"The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform."
John O’Donovan Chief Technical Architect
BBC
#11
Future Media BBC MMXII
10 000+ Dynamic Aggregations
Profile • Top 3 business media • Focused both on B2C publishing and B2B
services Goals
• Create a horizontal platform for both data and content based on semantics and serve all functionality through it
Challenges • Critical part of the entire workflow • Multiple development projects in parallel
with up to 2 months time between inception and go live
• GraphDB used not only for data, but for content storage as well
• Horizontal platform with focus on organizations, people, GPEs and relations between them
• Automatic extraction of all these concepts and relationships
• Separate stream of work for a user behavior based recommendation of relevant content and data across the entire media
Financial Times
#13
Profile • Established in 1961 to enable federal
agencies • Specializes in logistics, financial,
infrastructure & information management
Goals • Unlock large collections of complex
documents • Improve analyst productivity • Create an application they can sell to US
Federal agencies
Challenges • Analysts taking hours to find, download
and search documents, using inaccurate keyword searches
• Needed a knowledge base to search quickly and guide the analysts – highly relevant searches
• Extracts knowledge from collection of documents
• Uses GraphDB to intuitively search and filter • Knowledge base used to suggest searches • Hyper speed performance • Huge savings in analyst time • Accurate results
LMI
#14
Profile • Global, Bio-pharma company • $28 billion in sales in 2012 • $4 billion in R&D across three continents
Goals • Efficient design of new clinical studies • Quick access to all of the data • Improved evidence based decision-making • Strengthen the knowledge feedback loop • Enable predictive science Challenges • Over 7,000 studies and 23,000 documents
are difficult to obtain • Searches returning 1,000 – 10,000 results • Document repositories not designed for
reuse • Tedious process to arrive at evidence
based decisions
AstraZeneca
#15
Profile • Euromoney Institutional Investor PLC, the
international online information and events group
Goals • Create a horizontal platform to serve 100
different publications • create a new publishing and information
platform which would include the latest authoring, storing, and display technologies including, semantic annotation, search and a triple store repository
Challenges • Different domains covered • Sophisticated content analytics incl.
Relation, template and scenario extraction
• Analytics of reports and news of various domains • Extraction of sophisticated macro economic views
on markets and market conditions; trades, condition and trade horizons, assets, asset allocations, etc.
• Multi-faceted search • Completely new content and data infrastructure
Euromoney
#16
RESOURCE DESCRIPTION FRAMEWORK (RDF)
#17
What is RDF and what is it for?
• Resource Description Framework (RDF)
– A general method for describing data
– By defining relationships between things
• Simple, yet flexible & powerful data model
– Easily merge data from multiple sources
– Even if the underlying schemas differ
• Built around existing Web standards
– XML
– URL (URI)
#18
Resources, properties and literals
• Resources can be anything you want to describe
– Information resources can be found on the Web
– Non-information resources are anything else, e.g. people, organizations, places, things, events…
• Uniquely identified by a URI/IRI
– IRI is an internationalised URI
• Or can be identified by a blank node
– A unique anonymous value scoped to the current RDF document
#19
Resources, properties and literals
• Properties are the relationships between resources
– e.g. X fatherOf Y (where X and Y are URIs)
– Or attributes X rdfs:label “X”
• RDF schemas can define the types of things that properties apply to
• Properties are always identified by a URI
#20
Resources, properties and literals
• Literals are instances of datatypes
– e.g. string, integer, date
• Can have a language tag
– e.g. "Mass spectrometer"@EN
• Can have an XML schema datatype
– "1976-00-00T00:00:00Z"^^xsd:dateTime
• Can have no specific type, i.e. just a piece of text
– rdf:plainLiteral
#21
RDF triples and graphs
• RDF Statements are formed of three parts Subject Object Predicate
This is the resource that the statement is about: URI or blank node
The property that relaties the subject and object: URI
Either a resource (URI or blank node) or a literal
• A collection of statements makes a directed graph
#22
• How to model this kind of data?
• Missing values – who’s Pearl’s spouse? • Multiple values – merge them in one or add a new entry?
Relational DB to RDF: an Example
Person Spouse Child
Fred Wilma Pebbles
Wilma Fred Pebbles
Pearl -unknown- Wilma
Barney Betty Bamm-Bamm
Betty Barney Bamm-Bamm
Pebbles Bamm-Bamm Roxy, Chip
Bamm-Bamm Pebbles Roxy, Chip
#23
Relational DB to RDF: an Example
#24
Person
ID Name Gender
1 Betty F
2 Bamm-Bamm M
3 Barney M
Parent
ParID ChiID
1 2
…
Spouse
S1ID S2ID From To
1 3
…
Statement
Subject Predicate Object
:Human rdf:type rdfs:Class
:gender rdfs:type rdfs:Property
:hasChild rdfs:range :Human
:hasSpouse rdfs:range :Human
:Betty rdf:type :Human
:Betty rdf:label “Betty”
:Betty :gender “F”
:Bamm-Bamm rdf:label “Bamm-Bamm”
:Bamm-Bamm :gender “M”
:Betty :hasChild :Bamm-Bamm
:Betty :hasSpouse :Barney
…
Relational DB to RDF: an Example
#25
Semantic Data Integration
• A modern way to integrate highly heterogeneous data – Has emerged as the most promising approach in the last decade
• Based on 3 assumptions: – Everyone uses RDF
Solution: R2RML for RDB, TARQL for CSV, …
– Everyone uses consistent ontologies Solution: semantic mapping e.g. owl:equivalentClass, owl:equivalentProperty
– Everyone uses the same URIs: Solution: owl:sameAs, skos:exactMatch
#26
Merging RDF Data
• There are no restrictions on merging RDF graphs
• The same URI from different graphs is assumed to identify the same resource
• If a URI is used in multiple graphs then its description is a combination of all properties in all graphs
– i.e. a simple combination of graphs
• This is an enabler for Linked Open Data
– Where different organizations make statements about the same resources
#27
Named Graphs (Contexts)
• Triples can belong to named graphs (also a URI)
• Usually modeled in software as quads
– <Subject, Predicate, Object, Context>
– A statement is not required to have a graph (it then belongs to the default graph)
• Named graphs allow subsets of statements to be handled separately
– e.g. deleting all statements in a named graph
• All modern semantic repositories are quadstores (or bigger)
#28
Syntaxes
• The abstract structure of RDF is a collection of statements (triples)
• This can be written down in many ways
– RDF/XML
<?xml version="1.0" encoding="utf-8" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns0="http://example.org/elements#"> <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Helium"> <ns0:atomicNumber rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">2</ns0:atomicNumber> <ns0:atomicMass rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">4.002602</ns0:atomicMass> <ns0:specificGravity rdf:datatype="http://www.w3.org/2001/XMLSchema#double">1.663E-4</ns0:specificGravity> </rdf:Description> </rdf:RDF>
#29
Syntaxes
• The abstract structure of RDF is a collection of statements (triples)
• This can be written down in many ways
– RDF/XML
– Turtle
@prefix : <http://example.org/elements#> . <http://en.wikipedia.org/wiki/Helium> :atomicNumber 2 ; :atomicMass 4.002602 ; :specificGravity 1.663E-4 .
#30
Syntaxes
• The abstract structure of RDF is a collection of statements (triples)
• This can be written down in many ways
– RDF/XML
– Turtle
– JSON-LD
– …
[{"@id":"http://en.wikipedia.org/wiki/Helium","http://example.org/elements#atomicNumber":[{"@value":2}],"http://example.org/elements#atomicMass":[{"@value":"4.002602","@type":"http://www.w3.org/2001/XMLSchema#decimal"}],"http://example.org/elements#specificGravity":[{"@value":0.0001663}]}]
#31
RDF SCHEMA (RDFS)
#32
What is RDF Schema?
• RDFS provides means for
– Defining Classes and Properties
– Defining hierarchies (of classes and properties)
– Defining domain and range of properties
• RDFS differs from XML Schema (XSD)
– Open World Assumption vs. Closed World Assumption
– RDFS is about describing resources, not about validation
• Entailment rules (axioms)
– Infer new triples from existing ones
#33
RDFS entailment rules
• Class/Property hierarchies
• Inferring types (domain/range restrictions)
:Fred a :Man . :Fred a :Human . :Fred a :Mammal .
:Human rdfs:subClassOf :Mammal . :Man rdfs:subClassOf :Human . :Man rdfs:subClassOf :Mammal .
:hasSpouse rdfs:subPropertyOf :relatedTo . :Fred :hasSpouse :Wilma . :Fred :relatedTo :Wilma .
:hasSpouse rdfs:domain :Human ; rdfs:range :Human . :Barney :hasSpouse :Betty . :Barney a :Human . :Betty a :Human .
#34
ONTOLOGIES, OWL
#35
What is an ontology?
• Different formal specifications provide sharable and reusable knowledge representation
– Examples – taxonomies, thesauri, topic maps, …
• An ontology specification additionally includes
– Description of the classes in some domain and their properties
– Description of the possible relationships between classes and the constraints on how the relationships can be used
– Sometimes, the individuals (members of classes)
#36
Web Ontology Language (OWL)
• More expressive than RDFS
– Identity equivalence/difference • sameAs, differentFrom
• More expressive class definitions
– Class intersection, union, complement, disjointness
– Cardinality restrictions
• More expressive property definitions
– Object/Datatype properties
– Transitive, functional, symmetric, inverse properties
– Value restrictions
#37
Web Ontology Language (OWL)
• What can be done with OWL?
– Consistency checks – are there contradictions in the logical model?
– Satisfiability checks – are there classes that cannot have any instances?
– Classification – what is the type of a particular instance?
#38
39
An OWL class is defined by the OWL term owl:Class OWL classes can also be subclassed as in RDFS :PetDinosaur rdfs:subClassOf :Dinosaur
Class Construction
Pet Dinosaur
Dinosaur
#39
40
intersectionOf(Pet Dinosaur)
PetDinosaur
unionOf(WorkingDinosaur PetDinosaur)
Dinosaur
These can be combined to make more complex constructions:
intersectionOf( complementOf(Pet) Dinosaur)
WorkingOnlyDinosaur
Class Construction (2)
Dinosaur
Dinosaur
Pet
Pet
Working Dinosaur
Pet Dinosaur
#40
41
OWL OneOf is a class construct that allows a class to be completely defined from a list of named individuals. We say that these are the complete extension of this class, i.e. represent all the instances which may belong to the class. e.g. The class of directors of The Flintstones oneOf(:BrianLevrock :BrianLevant)
Class Construction (3)
Brian Levrock
Brian Levant
“The Flintstones” Directors
#41
Equivalence & Disjointness
• Of properties :hasSpouse owl:equivalentProperty :marriedTo
:hasSpouse owl:propertyDisjointWith :hasChild
• Of classes :Human owl:equivalentClass foaf:Person
:Man owl:disjointWith :Woman
• Of individuals (instances of classes) :JohnGoodman ^:playedBy :Fred; owl:sameAs linkedmdb:actor/31379
:PrehistoricAmerica owl:differentFrom dbr:Americas
#42
43
A cardinality is a specification of how many different values can be given to a property or an individual of a particular class. • Exact value: e.g. A married person can have exactly 1 spouse:
• Maximum value: e.g. A person can have at most 2 biological parents:
• Minimum value: e.g. To be a parent a person has to have at least one child:
Cardinalities
:MarriedPerson rdf:type owl:Class . _:bn1 a owl:Restriction; owl:onProperty :hasSpouse; owl:cardinality “1” . :MarriedPerson rdfs:subClassOf _:bn1 .
:Human rdf:type owl:Class . :hasBioParent rdfs:subPropertyOf :hasParent . _:bn2 a owl:Restriction; owl:onProperty :hasBioParent; owl:maxCardinality “2” . :Human rdfs:subClassOf _:bn2 .
:Parent rdf:type owl:Class . _:bn3 a owl:Restriction; owl:onProperty :hasChild; owl:minCardinality “1” . :Parent rdfs:subClassOf _:bn3 .
#43
44
OWL introduces property characteristics for more expressivity in inferrencing about instances and their properties Transitivity :Bedrock :partOf :CobblestoneCounty :CobblestoneCounty :partOf :PrehistoricAmerica :partOf a owl:TransitiveProperty :Bedrock :partOf :PrehistoricAmerica
Symmetry:Fred :hasSpouse :Wilma :hasSpouse a owl:symmetricProperty :Wilma :hasSpouse :Fred
Property Axioms
#44
45
Functional :Wilma :hasSpouse :Fred :Wilma :hasSpouse :MrFlintstone :hasSpouse a owl:functionalProperty :Fred owl:sameAs :MrFlintstone
Inverse :Fred :hasChild :Pebbles. :Wilma :hasChild :Pebbles. :hasParent owl:inverseOf :hasChild. :Pebbles :hasParent :Fred, :Wilma.
Property Axioms
#45
The property axiom InverseFunctional is useful for specifying unique properties identifying an individual e.g. Every person can be a spouse of exactly one person :Wilma :hasSpouse :Fred :MrsFlintstone :hasSpouse :Fred :hasSpouse a owl:inverseFunctionalProperty :Wilma owl:sameAs :MrsFlintstone
46
Property Axioms
#46
OWL sublanguages
– OWL Lite – low expressiveness / low computational complexity
– OWL 2 EL: Limited to basic classification, but with polynomial-time reasoning
– OWL 2 QL: Designed to be translatable to relational database querying
– OWL 2 RL: Designed to be efficiently implementable in rule-based systems
– OWL DL – high expressiveness / decidable & complete
– OWL Full – max expressiveness / no guarantees
More restrictive than OWL DL
#47
SEMANTIC DATABASES
#48
• Efficient indexing of RDF statements – Maintain predicate-object-subject and predicate-subject-objectSupport
transactions and isolation
– Atomicity, consistency, isolation and durability of write and read operations
• Exposes a SPARQL endpoint – Query data from anywhere
• Reasoning or consistency checking – Infer new facts
Why use a Semantic Database?
#49
• Standard compliance – Unlike most of the NoSQL and graph databases
– Based on a mature set of W3C standards: RDF, RDFS, OWL, SPARQL
• Flexible Schema – Unlike SQL databases
– RDF facilitates dealing with multiple schemata and schema evolution
• Allow for complex queries – Unlike the typical NoSQL databases
– SPARQL allows for comprehensive queries, similar to SQL
– Allows for queries that are not possible in SQL (unknown relation types)
• Linked Data Ready – RDF is the standard for linked data publication
How are RDF databases different?
#50
GRAPHDB
#51
GraphDB™ Editions
• GraphDB™ Free
• GraphDB™ Standard
• GraphDB™ Cloud
• GraphDB™ as-a-Service (S4)
• GraphDB™ Enterprise
#52
http://info.ontotext.com/graphdb-free-graphdb
GraphDB™ Free Installation
#53
To install GraphDB™ Free Edition, perform these steps:
• on Windows: run the installer and it starts automatically
• Otherwise: unzip, execute the startup script located in the root directory to start the GraphDB and Workbench interfaces :
startup.bat (Windows)
./startup.sh (Linux/Unix/Mac OS)
The message below appears in your Terminal and the GraphDB Workbench opens up at http://localhost:7200/.
INFO: Starting ProtocolHandler [“http-bio-7200”]
…
Opening web app in default browser
GraphDB™ Free Edition Installation Overview
#54
Create a new repository by:
• Launching the GraphDB™ Workbench
• Selecting “Admin”
• Selecting “Locations and Repositories”
• Configuring the new repository
GraphDB™ Free Edition Workbench New Repository
http://localhost:7200
#55
Manage your repositories
Change the repository from the dropdown menu in the top right corner.
#56
Load your data
Many options:
• Through the GraphDB Workbench – Load from local files
– Load from server files
– Load remote content
– Manually enter data in the text area
• Through SPARQL or RDF4J (Sesame) API
• Through the GraphDB LoadRDF tool – A low level bulk load tool, which writes directly in the database index
structures. It is ultra fast and supports parallel inference.
– Can be performed only if the repository is empty (great for the initial loading)
#57
Loading Data
Supported File Formats
#58
Load your data
Today: Load data from local files through the GraphDB Workbench.
1. Go to Data -> Import.
2. Open the Local files tab and click the Select files icon
#59
Explore your data
#60
Test the repository by
• Selecting “SPARQL”
• Submitting queries
GraphDB™ Workbench Execute Queries
2 Query 1 Insert Data
http://localhost:7200
#61
Query monitoring and interruption
To track and interrupt long running queries, go to Admin -> Query monitoring.
To interrupt long running queries, click the Abort query button.
#62
Ontotext GraphDB Connectors
• Provides extremely fast full text search, range, faceted search, and aggregations
• Utilize an external engine like Lucene, Solr or Elasticsearch
• Flexible schema mapping: index only what you need
• Real-time synchronization of data in GraphDB and the external engine
• Connector management via SPARQL
• Data querying & update via SPARQL
• Based on the GraphDB plug-in architecture
#63
Connectors – Primary Features
•Snippet extraction: highlighting of search terms in the search result
•Faceted search
– e.g. Europeana Food and Drink
•Sorting by any preconfigured field
•Paging of results using offset and limit
•Custom mapping of RDF types to Lucene types
•Specifying which Lucene analyzer to use (the default is Lucene's StandardAnalyzer)
•Weighting an entity by [numeric] value of one or more predicates
•Custom scoring expressions at query time to evaluate score based on Lucene
#64
And many more features …
• Blueprints (Apache TinkerPop, aka Gremlin) support – use graph programming frameworks or graph exploration software
• RDF Rank – identify “important” nodes in an RDF graph based on their interconnectedness
• GeoSPARQL support – represent and query geospatial linked data
#65
SPARQL
#66
What is SPARQL?
• SQL-like query language for RDF data
• 4 query types:
– Ask, Select, Construct, Describe
• Query extensions:
– Aggregates, Subqueries, Negation, Filters, Optional patterns, …
• Data management updates:
– Insert data, Delete data, Delete/Insert
• Graph management updates:
– Create, Load, Clear, Drop, Copy, Move, Add
#67
What is a SPARQL query?
Main idea: Pattern matching
• Queries describe sub-graphs of the queried graph
• Graph patterns are RDF graphs specified in Turtle syntax, which contain variables (prefixed by either “?” or “$”)
• Sub-graphs that match the graph patterns yield a result
?child :Pebbles :hasChild
:Pebbles :hasChild :Roxy
:Pebbles :hasChild :Chip
#68
SPARQL query types - ASK
• ASK – test whether a query patterns has a solution ASK WHERE {?parent :hasChild ?child}
• Returns: YES
#69
SPARQL query types - SELECT
• SELECT – returns variables & their bindings SELECT ?parent ?child WHERE {?parent :hasChild ?child}
?parent ?child
:Pearl :Wilma
:Wilma :Pebbles
:Fred :Pebbles
:Barney :Bamm-Bamm
:Betty :Bamm-Bamm
:Pebbles :Roxy
:Pebbles :Chip
:Bamm-Bamm :Roxy
:Bamm-Bamm :Chip
#70
Namespace definitions
Query form + variables Data sources
Query patterns & filters
Solution modifiers
Components of a SPARQL query
PREFIX : <http://www.example.org/bedrock#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX dbr: <http://www.dbpedia.org/resource/> PREFIX foaf: <http://xmlns.com/foaf/> SELECT ?grandParent ?grandChild FROM <http://www.example.org/bedrock#> WHERE { ?grandParent :hasChild ?parent . ?parent :hasChild ?grandChild. } ORDER BY (?grandChild)
(Output will be pairs of grandparent and grandchild URIs ordered alphabetically for the grandchild URIs)
#71
Graph patterns
• Basic graph patterns
– A conjunction of triple patterns
• Optional graph pattern
– Specifies optional parts of a pattern (similar to an “outer join” in SQL)
• Union graph patterns
– Specifies disjunctions (alternatives)
#72
• Find all pairs of children and parents and include the parent’s workplace if it’s specified in the data.
PREFIX : <http://www.example.org/bedrock#>
SELECT ?parent ?child ?company
WHERE {
?parent :hasChild ?child.
OPTIONAL {?parent :worksFor ?company}
}
Optional graph pattern
?parent ?child ?company
:Pearl :Wilma
:Wilma :Pebbles
:Fred :Pebbles :RockQuarry
:Barney :Bamm-Bamm :RockQuarry
:Betty :Bamm-Bamm
:Pebbles :Roxy
:Pebbles :Chip
:Bamm-Bamm :Roxy
:Bamm-Bamm :Chip
#73
Union graph pattern
• Find children of either Fred or Barney and return pairs of those children with each of their parents.
PREFIX : <http://www.example.org/bedrock#>
SELECT ?parent ?child
WHERE {
{:Fred :hasChild ?child}
UNION
{:Barney :hasChild ?child}
?parent :hasChild ?child
}
?parent ?child
:Wilma :Pebbles
:Fred :Pebbles
:Barney :Bamm-Bamm
:Betty :Bamm-Bamm
#74
Order By modifier
• Let’s order those alphabetically for the parent URIs.
PREFIX : <http://www.example.org/bedrock#>
SELECT ?parent ?child
WHERE {
{:Fred :hasChild ?child}
UNION
{:Barney :hasChild ?child}
?parent :hasChild ?child
}
ORDER BY (?parent)
?parent ?child
:Barney :Bamm-Bamm
:Betty :Bamm-Bamm
:Fred :Pebbles
:Wilma :Pebbles
#75
Filtering solutions
• Find people who are over 30 years of age.
PREFIX : <http://www.example.org/bedrock#>
PREFIX foaf: <http://xmlns.com/foaf/>
SELECT ?person ?age
WHERE {
?person foaf:age ?age .
FILTER (?age > 30).
} ORDER BY (?age)
?person ?age
:Fred 44
:Barney 45
Statement
Subject Predicate Object
:Fred foaf:age "44"^^xsd:integer
:Barney foaf:age "45"^^xsd:integer
:Chip foaf:age "1"^^xsd:integer
:Bamm-Bamm foaf:age "22"^^xsd:integer
#76
Aggregates
• Aggregates allow computation of values using: – COUNT, SUM, MIN, MAX, AVG, etc.
• Built around the GROUP BY operator
• For example computing popularity in a social graph: SELECT ?person (COUNT(?someone) AS ?popularity)
WHERE {?someone foaf:knows ?person}
GROUP BY ?person
• Prune at group level (cf. FILTER) using HAVING, e.g.: GROUP BY ?person HAVING (COUNT(?someone) > 4)
#77
Expressions in SELECT clauses
• SPARQL 1.1 allows functions use with variables in the head of the query
• For example, to glue together names of spouses:
SELECT (CONCAT(?wifeName + " and " + ?husbandName + " " + ?husbandSurname) AS ?familyName) WHERE { ?wife a :female; foaf:firstName ?wifeName; :hasSpouse ?husband. ?husband a :male; foaf:firstName ?husbandName; foaf:familyName ?husbandSurname }
?familyName
"Wilma and Fred Flintstone"
“Betty and Barney Rubble"
“Pebbles and Bamm-Bamm Rubble"
#78
Property Paths
• SPARQL 1.0 builds graph patterns from triple patterns, where resources are separated in the graph by one arc
• SPARQL 1.1 generalizes on triple patterns to model resources separated by paths of arbitrary length
• e.g. Get all ancestors regardless how many links away SELECT ?ancestor WHERE {?person :hasParent+/foaf:name ?ancestor}
• e.g. Get all ancestors and oneself SELECT ?ancestor WHERE {?person :hasParent*/foaf:name ?ancestor}
#79
SPARQL 1.1 Data Management
• 3 ways to modify data within a graph INSERT DATA {
:fred foaf:name "Freddy Flintstone". :fred foaf:firstName "Freddy" } DELETE DATA { :fred foaf:name "Fred Flintstone". :fred foaf:firstName "Fred" } DELETE {?person foaf:name "Freddy Flintstone"; foaf:firstName "Freddy".} INSERT {?person foaf:name "Fred Flintstone"; foaf:firstName "Fred".} WHERE {?person foaf:name "Freddy Flintstone"; foaf:firstName "Freddy".}
• 2 ways to further change the data within a graph: LOAD <http://www.example.org/bedrock/> INTO GRAPH <http://ontotext.com/bedrock#>
CLEAR GRAPH <http://ontotext.com/bedrock#>
#80
SPARQL 1.1 Graph Management
• A new named graph can be explicitly created
CREATE GRAPH <http://ontotext.com/bedrock#>
• … or dropped
DROP GRAPH <http://ontotext.com/bedrock#>
• … but also
COPY DEFAULT to <http://ontotext.com/bedrock#> MOVE DEFAULT to <http://ontotext.com/bedrock#> ADD DEFAULT to <http://ontotext.com/bedrock#>
#81
LINKED OPEN DATA
#82
What is Linked Data?
• “To make the Semantic Web a reality, it is necessary to have a large volume of data available on the Web in a standard, reachable and manageable format. In addition the relationships among data also need to be made available. This collection of interrelated data on the Web can also be referred to as Linked Data. Linked Data lies at the heart of the Semantic Web.” (W3C)
• Linked Data is a set of simple principles that allows publishing, querying and browsing of RDF data, distributed across different servers
#83
Linked Data design principles
1. Unambiguous identifiers for data resources
– “Use URIs as names for things.”
2. Use the structure of the web
– “Use HTTP URIs so that people can look up the names.”
3. Make it easy to discover information about resources
– “When someone lookups a URI, provide useful information, using the standards (RDF, SPARQL).”
4. Link the data resource to related resources
– “Include links to other URIs, so that users can discover more things.”
#84
Linked Data design principles
3. When someone lookups a URI, provide useful information, using the standards (RDF, SPARQL)
What to return for a URI?
• Immediate description: triples where the URI is the subject.
• Backlinks: triples where the URI is the object.
• Related descriptions: information of interest in typical usage scenarios.
• Metadata: information as author and licensing information.
• Syntax: RDF descriptions as RDF/XML and human-readable formats Source: How to Publish Linked Data on The Web - Chris Bizer, Richard Cyganiak, Tom Heath.
#85
4. Include links to other URIs, so that users can discover more things
There are several ways to reuse URIs:
• direct reuse
• (OWL) sameAs
• (SKOS) exactMatch, closeMatch
• (RDFS) seeAlso
• direct reuse of class/property
• (RDFS) sub-class/-property
• (OWL) equivalent class/property
• (SKOS) broadMatch
Linked Data design principles
Instance Level
Schema Level
#86
Linked Data 5 Star
Data is available on the Web.
Data is available as machine-readable structured data.
Non-proprietary formats are used.
Individual data identified with open standards.
Data is linked to other data providers.
#87
Linked Data evolution (2007)
(c) R. Cyganiak & A. Jentzsch
#88
Linked Data evolution (2008)
(c) R. Cyganiak & A. Jentzsch
#89
Linked Data evolution (2009)
(c) R. Cyganiak & A. Jentzsch
#90
Linked Data evolution (2010)
(c) R. Cyganiak & A. Jentzsch
#91
Linked Data evolution (2011)
(c) R. Cyganiak & A. Jentzsch
#92
Linked Data evolution (2014)
(c) R. Cyganiak & A. Jentzsch
#93
State of LOD
(c) Bizer, Cyganiak & Jentzsch
Number of triples
Number of out-links
#94
COMMONLY USED LOD DATASETS
#95
GeoNames
• The GeoNames geographical database covers all countries and contains over 11M placenames that are available for download free of charge.
#96
VIAF
• 20 National Libraries and 15 other contributors, 35M persons, organizations, places, conferences
#97
Wikidata
• Provides structured data to Wikipedias.
• Over 19M entities
#98
And many more …
• DBpedia: extracts structured data from Wikipedias.
– 4.5M en, 3M de, …
– (Wikidata: provides structured data to Wikipedias)
• Freebase: basis of Google Knowledge Graph, phasing out
• Data from Schema.org marked up websites
• Europeana: over 50M records from museums, libraries, archives and multi-media collections (Ontotext hosts the EDM SPARQL repository)
• OpenTED: EU Tender Electronic Daily
• LinkedMDB: movies
• BBC Sports, Wildlife, Music, Programmes (started by Ontotext)
#99
Support and FAQ’s
Additional resources: Ontotext: Community Forum and Evaluation Support: http://stackoverflow.com/questions/tagged/graphdb GraphDB Website and Documentation: http://graphdb.ontotext.com Whitepapers, Fundamentals: http://ontotext.com/knowledge-hub/fundamentals/ SPARQL, OWL, and RDF: RDF: http://www.w3.org/TR/rdf11-concepts/ RDFS: http://www.w3.org/TR/rdf-schema/ SPARQL Overview: http://www.w3.org/TR/sparql11-overview/ SPARQL Query: http://www.w3.org/TR/sparql11-query/ SPARQL Update: http://www.w3.org/TR/sparql11-update
#100
For Further Information
• Georgi Georgiev, Head of Global Alliances Development
– 359.882.885.636
• Ilian Uzunov, Europe Sales and Business Development
– 359.888.772.248
• Peio Popov, North America Sales and Business Development
– 1.929.239.0659
#101
Training Portfolio
#102
Introduction to Semantic Technologies
GraphDB
for beginners for developers for administrators •Overview of the Semantic Technologies landscape •Advantages to using RDF and triplestores •Ontologies for more meaningful data •Reasoning and inference •Querying with SPARQL •Linked Open Data
•Overview of the Semantic Technologies landscape •Using GraphDB to cut costs and increase revenue •Domain-specific applications and use cases of GraphDB •Exploring data using GraphDB •Gaining insights from SPARQL queries
•RDF and triplestores Querying with SPARQL •Making use of LOD •Immersive introduction to GraphDB functionality •components and architecture •rulesets and inference •GraphDB Connectors •Plugins and query modifiers
•GraphDB standard operability components and architecture •rulesets and inference •performance optimizations •Users and access rights •setup & maintenance •common caveats
Semantic use cases, solutions & applications
Text analytics and semantics with GATE
Eclipse RDF4J
•How semantic products & solutions can • cut costs and increase revenue along a
market vertical. • increase transparency and accessibility,
enrich own data and make use of LOD. •Calculate adoption time and cost; cost of ownership; revenue increase opportunities. •domain-specific solutions: reference architectures & key technical components
•Extract information from text •Index and query semantically annotated data with GATE Mimir •Train Machine Learning algorithms for Information Extraction •Setup and use GATE Cloud •Develop GATE applications •Work with GATE Embedded
•Overview of Semantic Technologies •processing and handling RDF data •repository configuration •programming with RDF4J •extending functionality through • triplestore storage (GraphDB) • free text search (LuceneSail) • geospatial search (GeoSPARQL) • meta-modeling (SPIN)
Announcement
Free webinar: Integrating siloed structured and unstructured data with GraphDB™ 10 November 2016 | 11am EDT | 4pm BST | 6pm EEST
Topics covered:
• Installing GraphDB™ and configuring your repository
• Using simple ontologies for automated reasoning on data
• Transforming, cleaning up and linking your heterogeneous data with OntoRefine
• Loading all of your distributed data in one unified data layer
• Querying and updating your data with SPARQL
• Data visualization with GraphDB™
• More ways to make use of GraphDB™’s capabilities, example use cases
• Comparisson of GraphDB™ editions
• Overview of Ontotext’s training portfolio
#103
The End