graph database - unimi.it‣ cypher is an expressive (yet compact) graph database query language ‣...

34
GRAPH DATABASE Ernesto Damiani and Paolo Ceravolo [email protected] Università degli Studi di Milano Dipartimento di Informatica

Upload: others

Post on 21-Jun-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE

Ernesto Damiani and Paolo [email protected]

Università degli Studi di MilanoDipartimento di Informatica

Page 2: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHAT IS A GRAPH?

‣ Formally, a graph is just a collection of vertices and edges

‣ Graphs represent entities as nodes and the ways in which those entities relate as relationships

‣ This general-purpose, expressive structure allows us to model all kinds of scenarios

‣ Graphs are extremely useful in understanding a wide diversity of datasets in fields such as science, government, and business

‣ Represent networks: social structures, topological relationships

‣ Represent a sequence of events

‣ Represent relationships between concepts: hyperonymy, hyponymy, meronymy

Page 3: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHAT IS A GRAPH?

Page 4: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHAT IS A GRAPH?

Page 5: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

THE LABELED GRAPH MODEL

‣ The most popular form of graph model is the Labeled Graph Model

‣ It contains nodes and relationships

‣ Nodes contain properties (key-value pairs)

‣ Nodes can be labeled with one or more labels

‣ Relationships are named and directed, and always have a start and end node

‣ Relationships can also contain properties

Page 6: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

THE LABELED GRAPH MODEL

{date: 20

Page 7: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE MANAGEMENT SYSTEM

‣ A Graph Database Management System is an online database management system

‣ CRUD (Create, Read, Update, and Delete) properties

‣ OLTP (Online Transaction Processing) transactional systems

‣ OLAP (Online Analytical Processing)

‣ Management System that address scalability are also available

Page 8: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE MANAGEMENT SYSTEM

‣ There are two properties of graph databases we should consider when investigating graph database technologies:

‣ The underlying storage

‣ Some graph databases use native graph storage that is optimised and designed for storing and managing graphs

‣ The processing engine

‣ Native graph processing require that a graph database use index-free adjacency, meaning that connected nodes physically “point” to each other in the database

Page 9: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE MANAGEMENT SYSTEM

‣ Index-free adjacency

‣ A graph processing engine is said native if it implements index-free adjacency

‣ An index table implies O(log n) computational complexity while adjacent relationship O(1)

‣ The cost of queries is not dependent on the size of the graph but on the size of the traversed path

‣ With index-free adjacency, bidirectional joins are effectively precomputed and stored in the database as relationships

Page 10: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE MANAGEMENT SYSTEM

Page 11: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH COMPUTE ENGINES

‣ A graph compute engine is a technology that enables global graph computational algorithms to be run against large datasets

‣ The architecture includes a system of record (SOR) database with OLTP properties

‣ Periodically, an Extract, Transform, and Load (ETL) job moves data from the system of record database into the graph compute engine for offline querying and analysis

Page 12: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHY USING GRAPH DATABASES

‣ Performances

‣ In contrast to relational databases, where join-intensive query performance deteriorates as the dataset gets bigger, with a graph database performance tends to remain relatively constant, even as the dataset grows. This is because queries are localized to a portion of the graph

‣ Flexibility

‣ Structure and schema can emerge with our growing understanding of the problem space

‣ Graphs are naturally additive, meaning we can add new kinds of relationships, new nodes, new labels, and new subgraphs to an existing structure without disturbing existing queries and application functionality

‣ Semantic lifting and expansion are naturally implemented on graphs

‣ Integration with heterogeneous sources is also more natural in graph databases

Page 13: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHY USING GRAPH DATABASES

‣ Agility

‣ Governance is typically applied in a programmatic fashion, using tests to drive out the data model and queries, as well as assert the business rules that depend upon the graph

Page 14: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

RELATIONAL DATABASES LACK RELATIONSHIPS

‣ Join tables add accidental complexity; they mix business data with foreign key metadata

‣ Foreign key constraints add additional development and maintenance overhead

‣ parse tables with nullable columns require special checking in code

‣ Several expensive joins are often needed

‣ Reciprocal queries are even more costly

Page 15: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

RELATIONAL DATABASES LACK RELATIONSHIPS

‣ Relational databases struggle with highly connected domains

‣ To understand the cost of performing connected queries in a relational database, we’ll look at some simple and not-so-simple queries in a social network domain

SELECT p1.PersonFROM Person p1 JOIN PersonFriend

ON PersonFriend.FriendID = p1.ID JOIN Person p2

ON PersonFriend.PersonID = p2.ID

WHERE p2.Person = 'Bob'

Page 16: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

RELATIONAL DATABASES LACK RELATIONSHIPS

‣ Relational databases struggle with highly connected domains

‣ To understand the cost of performing connected queries in a relational database, we’ll look at some simple and not-so-simple queries in a social network domain

SELECT p1.PersonFROM Person p1 JOIN PersonFriend

ON PersonFriend.PersonID = p1.ID JOIN Person p2

ON PersonFriend.FriendD = p2.ID

WHERE p2.Person = 'Bob'

Page 17: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

NOSQL DATABASES ALSO LACK RELATIONSHIPS ‣ Seeing a reference to order: 1234 in the

record beginning user: Alice, we infer a connection between user: Alice and order: 1234. This gives us false hope that we can use keys and values to manage graphs

‣ There are no identifiers that “point” backward (the foreign aggregate “links” are not reflexive, of course), we lose the ability to run other interesting queries on the database

‣ Aggregate stores do not maintain consistency of connected data, nor do they support what is known as index- free adjacency

‣ Aggregate stores must employ inherently latent methods for creating and querying relationships outside the data model

Page 18: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

PERFORMANCE

‣ Graph Databases are designed to traverse graphs, their performances in querying interconnected domains are high

Page 19: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

PERFORMANCE

‣ Graph Databases are designed to traverse graphs, their performances in querying interconnected domains are high

Page 20: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS‣ Cypher is an expressive (yet compact) graph database query language

‣ Other graph databases have other means of querying data. Many, including Neo4j, support the RDF query language SPARQL and the imperative, path-based query language Gremlin

(emil)<-[:KNOWS]-(jim)-[:KNOWS]->(ian)-[:KNOWS]->(emil)

Page 21: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS

 (emil:Person {name:'Emil'})      <-[:KNOWS]-(jim:Person {name:'Jim'})      -[:KNOWS]->(ian:Person {name:'Ian'})      -[:KNOWS]->(emil)

Page 22: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS

MATCH (a:Person {name:'Jim'})-[:KNOWS]->(b)-[:KNOWS]->(c), (a)-[:KNOWS]->(c)RETURN b, c

Page 23: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS

MATCH (a:Person)-[:KNOWS]->(b)-[:KNOWS]->(c), (a)-[:KNOWS]->(c) WHERE a.name = 'Jim'RETURN b, c

Page 24: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS‣ Cypher Clauses

‣ WHERE: Provides criteria for filtering pattern matching results.

‣ CREATE and CREATE UNIQUE: Create nodes and relationships.

‣ MERGE: Ensures that the supplied pattern exists in the graph, either by reusing existing nodes and relationships that match the supplied predicates, or by creating new nodes and relationships.

‣ DELETE: Removes nodes, relationships, and properties.

‣ SET: Sets property values.

‣ FOREACH: Performs an updating action for each element in a list.

‣ UNION: Merges results from two or more queries.

‣ WITH: Chains subsequent query parts and forwards results from one to the next. Similar to piping commands in Unix.

Page 25: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING‣ Graph databases provide for the smooth evolution of a data model

‣ We develop the data model feature by feature, user story by user story

Page 26: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING

Page 27: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING

Page 28: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING

Page 29: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING

Page 30: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING‣ If we need to find all the events

that have occurred over a specific period, we can build a timeline tree

Page 31: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING‣ The carousel fraud

Page 32: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS‣ POLE MODEL

‣ The POLE data model focuses on four basic types of entities and the relationships between them: Persons, Objects, Locations, and Events

Greater Manchester, UK from August 2017

Page 33: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INTEGRATION WITH ONTOLOGIES ‣ An ontology is a formal, explicit specification of a shared

conceptualization that is characterized by high semantic expressiveness required for increased complexity ( Feilmayr and Wöß - 2016)

‣ Ontology are typically represented as graphs

‣ Web Ontology Language (OWL) is typically represented using RDF triples

‣ Ontologies contain inference rules that can be applied to a knowledge base

Page 34: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INTEGRATION WITH ONTOLOGIES ‣ Taking an example for the  LUBM benchmark (Lehigh University Benchmark), a

student is derived to be an attendee if he or she takes some course

‣ Thus when she matches the following ontological rule: Student and (takesCourse some) SubClassOf Attendee

‣Any experienced Neo4j programmer may rub his or her hands since this rule can be translated straightforward into the following Cypher expression:

match (x:Student)-[:takesCourse]->() set x:Attendee

‣ That is perfectly possible but could become cumbersome in case of deeply nested rules that may also depend on each other

‣ For instance, the Cypher expression misses the subclasses of Student such as UndergraduateStudent. Strictly speaking the expression above should therefore read: match (x)-[:takesCourse]->() where x:Student or x:UndergraduateStudent set x:Attendee