a whirlwind tour of graph databases

Post on 18-Mar-2018

254 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GraphDBWhirlwind TourMichael Hunger

Code Days - OOP

(Michael Hunger)-[:WORKS_FOR]->(Neo4j)

michael@neo4j.com | @mesirii | github.com/jexp | jexp.de/blog

Michael Hunger - Head of Developer Relations @Neo4j

WhyGraphs

?

UseCases

Data Model

Query-ing

Neo4j

Why Graphs?

Because the World is a Graph!

Everything and Everyone is Connected

• people, places, events• companies, markets• countries, history, politics• sciences, art, teaching• technology, networks, machines, applications, users• software, code, dependencies, architecture, deployments• criminals, fraudsters and their behavior

Value from Relationships

Value from Data RelationshipsCommon Use Cases

Internal Applications

Master Data Management

Network and IT Operations

Fraud Detection

Customer-Facing Applications

Real-Time Recommendations

Graph-Based Search

Identity and Access Management

The Rise of Connections in Data

Networks of People Business Processes Knowledge Networks

E.g., Risk management, Supply chain, Payments

E.g., Employees, Customers, Suppliers, Partners, Influencers

E.g., Enterprise content, Domain specific content, eCommerce content

Data connections are increasing as rapidly as data volumes

9

Harnessing Connections Drives Business Value

Enhanced Decision

Making

Hyper

Personalization

Massive Data

Integration

Data Driven Discovery

& Innovation

Product Recommendations

Personalized Health Care

Media and Advertising

Fraud Prevention

Network Analysis

Law Enforcement

Drug Discovery

Intelligence and Crime Detection

Product & Process Innovation

360 view of customer

Compliance

Optimize Operations

Connected Data at the Center

AI & Machine

Learning

Price optimization

Product Recommendations

Resource allocation

Digital Transformation Megatrends

Graph Databases areHOT

Graph Databases Are Hot

Lots of Choice

Newcomers in the last 3 years

• DSE Graph

• Agens Graph

• IBM Graph

• JanusGraph

• Tibco GraphDB

• Microsoft CosmosDB

• TigerGraph

• MemGraph

• AWS Neptune

• SAP HANA Graph

Database Technology Architectures

Graph DB

Connected DataDiscrete Data

Relational DBMSOther NoSQL

Right Tool for the Job

The impact of Graphs

How Graphs are changing the World

GRAPHSFOR GOOD

Neo4j ICIJ Distribution

Better Health with Graphs

Cancer Research - Candiolo Cancer Institute

“Our application relies on complexhierarchical data, which required a moreflexible model than the one provided bythe traditional relational databasemodel,” said Andrea Bertotti, MD

neo4j.com/case-studies/candiolo-cancer-institute-ircc/

Graph Databases in Healthcare and Life Sciences

14 Presenters from all around Europe on:

• Genome• Proteome• Human Pathway• Reactome• SNP• Drug Discovery• Metabolic Symbols• ...

neo4j.com/blog/neo4j-life-sciences-healthcare-workshop-berlin/

DISRUPTIONWITHGRAPHS

BETTERBUSINESSWITH GRAPHS

28

Real-Time Recommendations

Fraud Detection

Network &IT Operations

Master Data Management

Knowledge Graph

Identity & Access Management

Common Graph Technology Use Cases

AirBnb

30

• Record “Cyber Monday” sales• About 35M daily transactions• Each transaction is 3-22 hops• Queries executed in 4ms or less• Replaced IBM Websphere commerce

• 300M pricing operations per day• 10x transaction throughput on half the

hardware compared to Oracle• Replaced Oracle database

• Large postal service with over 500k employees

• Neo4j routes 7M+ packages daily at peak, with peaks of 5,000+ routing operations per second.

Handling Large Graph Work Loads for Enterprises

Real-time promotion

recommendations

Marriott’s Real-time

Pricing Engine

Handling Package

Routing in Real-Time

Software

Financial Services Telecom

Retail & Consumer Goods

Media & Entertainment Other Industries

Airbus

NEWINSIGHTSWITH GRAPHS

Machine Learning is Based on Graphs

The Property GraphModel, Import, Query

The Whiteboard Model Is the Physical Model

Eliminates Graph-to-Relational Mapping

In your data modelBridge the gap

between business and IT models

In your applicationGreatly reduce need for application code

CAR

name: “Dan”born: May 29, 1970

twitter: “@dan”name: “Ann”

born: Dec 5, 1975

since: Jan 10, 2011

brand: “Volvo”model: “V70”

Property Graph Model Components

Nodes• The objects in the graph• Can have name-value properties• Can be labeled

Relationships• Relate nodes by type and direction• Can have name-value properties

LOVES

LOVES

LIVES WITHPERSON PERSON

Cypher: Powerful and Expressive Query Language

MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )

LOVES

Dan Ann

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

Relational Versus Graph Models

Relational Model Graph Model

KNOWSANDREAS

TOBIAS

MICA

DELIA

Person FriendPerson-Friend

ANDREASDELIA

TOBIAS

MICA

Retail ...

Recommendations

Our starting point – Northwind ER

Building Relationships in Graphs

ORDERED

Customer OrderOrder

Locate Foreign Keys

(FKs)-[:BECOME]->(Relationships) & Correct Directions

Drop Foreign Keys

Find the Join Tables

Simple Join Tables Becomes Relationships

Attributed Join Tables Become Relationships with Properties

(One) Northwind Graph Model

(:You)-[:QUERY]->(:Data)

in a graph

Who bought Chocolat?

You all know SQL

SELECT distinct c.CompanyNameFROM customers AS cJOIN orders AS o

ON (c.CustomerID = o.CustomerID)JOIN order_details AS od

ON (o.OrderID = od.OrderID)JOIN products AS p

ON (od.ProductID = p.ProductID)WHERE p.ProductName = 'Chocolat'

Apache Tinkerpop 3.3.x - Gremlin

g = graph.traversal();g.V().hasLabel('Product')

.has('productName','Chocolat')

.in('INCLUDES')

.in('ORDERED')

.values('companyName').dedup();

W3C Sparql

PREFIX sales_db: <http://sales.northwind.com/>

SELECT distinct ?company_name WHERE {<sales_db:CompanyName> ?company_name .

?c <sales_db:ORDERED> ?o .?o <sales_db:ITEMS> ?od .?od <sales_db:INCLUDES> ?p .?p <sales_db:ProductName> "Chocolat" .

}

openCypher

MATCH (c:Customer)-[:ORDERED]->(o)-[:INCLUDES]->(p:Product)

WHERE p.productName = 'Chocolat'

RETURN distinct p.companyName

Basic Pattern: Customers Orders?

MATCH (:Customer {custName:"Delicatessen"} ) -[:ORDERED]-> (order:Order) RETURN order

VAR LABEL

NODE NODE

LABEL PROPERTY

ORDERED

Customer OrderOrder

REL

Basic Query: Customer's Orders?

MATCH (c:Customer)-[:ORDERED]->(order)

WHERE c.customerName = 'Delicatessen'

RETURN *

Basic Query: Customer's Frequent Purchases?

MATCH (c:Customer)-[:ORDERED]->()-[:INCLUDES]->(p:Product)

WHERE c.customerName = 'Delicatessen'

RETURN p.productName, count(*) AS freqORDER BY freq DESC LIMIT 10;

openCypher - Recommendation

MATCH(c:Customer)-[:ORDERED]->(o1)-[:INCLUDES]->(p),(peer)-[:ORDERED]->(o2)-[:INCLUDES]->(p),(peer)-[:ORDERED]->(o3)-[:INCLUDES]->(reco)

WHERE c.customerId = $customerIdAND NOT (c)-[:ORDERED]->()-[:INCLUDES]->(reco)

RETURN reco.productName, count(*) AS freqORDER BY freq DESC LIMIT 10

Product Cross-Sell

MATCH(:Product {productName: 'Chocolat'})<-[:INCLUDES]-(:Order)<-[:SOLD]-(employee)-[:SOLD]->()-[:INCLUDES]->(cross:Product)

RETURNemployee.firstName, cross.productName, count(distinct o2) AS freq

ORDER BY freq DESC LIMIT 5;

openCypher

openCypher...

...is a community effort to evolve Cypher, and tomake it the most useful language for querying property graphs

openCypher implementations

SAP Hana Graph, Redis, Agens Graph, Cypher.PL, Neo4j

github.com/opencypher Language Artifacts

● Cypher 9 specification● ANTLR and EBNF Grammars● Formal Semantics (SIGMOD)● TCK (Cucumber test suite)● Style Guide

Implementations & Code

● openCypher for Apache Spark● openCypher for Gremlin● open source frontend (parser)● ...

Cypher 10

● Next version of Cypher

● Actively working on natural language specification

● New features○ Subqueries○ Multiple graphs○ Path patterns○ Configurable pattern matching semantics

Extending Neo4j

Extending Neo4j -User Defined Procedures & Functions

Neo4j Execution EngineUser Defined

Procedure

User Defined Functions

Applications

Bolt

User Defined Procedures & Functions let

you write custom code that is:

• Written in any JVM language

• Deployed to the Database

• Accessed by applications via Cypher

Procedure Examples

Built-In• Metadata Information

• Index Management

• Security

• Cluster Information

• Query Listing & Cancellation

• ...

Libraries• APOC (std library)• Spatial• RDF (neosemantics)• NLP• ...

neo4j.com/developer/procedures-functions

Example: Data(base) Integration

Graph Analytics

Neo4j Graph Algorithms

”Graph analysis is possibly the single most effective

competitive differentiator for organizations pursuing data-

driven operations and decisions“

The Impact of Connected Data

Existing Options (so far)

•Data Processing•Spark with GraphX, Flink with Gelly•Gremlin Graph Computer

•Dedicated Graph Processing•Urika, GraphLab, Giraph, Mosaic, GPS, Signal-Collect, Gradoop

•Data Scientist Toolkit• igraph, NetworkX, Boost in Python, R, C

Goal: Iterate Quickly

•Combine data from sources into one graph

•Project to relevant subgraphs

•Enrich data with algorithms

•Traverse, collect, filter aggregate with queries

•Visualize, Explore, Decide, Export

•From all APIs and Tools

1. Call as Cypher procedure

2. Pass in specification (Label, Prop, Query) and configuration

3. ~.stream variant returns (a lot) of results

CALL algo.<name>.stream('Label','TYPE',{conf})

YIELD nodeId, score

4. non-stream variant writes results to graph returns statistics

CALL algo.<name>('Label','TYPE',{conf})

Usage

Pass in Cypher statement for node- and relationship-lists.

CALL algo.<name>(

'MATCH ... RETURN id(n)',

'MATCH (n)-->(m)

RETURN id(n) as source,

id(m) as target', {graph:'cypher'})

Cypher Projection

DEMO: OOP

Development

Data Storage andBusiness Rules Execution

Data Mining and Aggregation

Neo4j Fits into Your Environment

Application

Graph Database Cluster

Neo4j Neo4j Neo4j

Ad HocAnalysis

Bulk AnalyticInfrastructure

Graph Compute EngineEDW …

Data Scientist

End User

Databases

RelationalNoSQL

Hadoop

Official Language Drivers

• Foundational drivers for popular programming languages

• Bolt: streaming binary wire protocol

• Authoritative mapping to native type system, uniform across drivers

• Pluggable into richer frameworks

JavaScript Java .NET Python PHP, ....

Drivers

Bolt

Bolt + Official Language Drivers

http://neo4j.com/developer/ http://neo4j.com/developer/language-guides/

Using Bolt: Official Language Drivers look all the same

With JavaScript

var driver = Graph.Database.driver("bolt://localhost");

var session = driver.session();

var result = session.run("MATCH (u:User) RETURN u.name");

neo4j.com/developer/spring-data-neo4j

Spring Data Neo4j Neo4j OGM

@NodeEntitypublic class Talk { @Id @GeneratedValueLong id; String title; Slot slot;Track track; @Relationship(type="PRESENTS",

direction=INCOMING) Set<Person> speaker = new HashSet<>();

}

Spring Data Neo4j Neo4j OGM

interface TalkRepository extends Neo4jRepository<Talk, Long> {

@Query("MATCH (t:Talk)<-[rating:RATED]-(user) WHERE t.id = {talkId} RETURN rating")

List<Rating> getRatings(@Param("talkId") Long talkId);

List<Talk> findByTitleContaining(String title);}

github.com/neoj4-contrib/neo4j-spark-connector

Neo4j Spark Connector

github.com/neo4j-contrib/neo4j-jdbc

Neo4j JDBC Driver

Neo4j

THE Graph Database Platform

Graph Transactions

Graph Analytics

Data Integration

Development & Admin

Analytics Tooling

Drivers & APIs Discovery & Visualization

Developers

Admins

Applications Business Users

Data Analysts

Data Scientists

• Operational workloads

• Analytics workloads

Real-time Transactional

and Analytic Processing • Interactive graph exploration

• Graph representation of data

Discovery and

Visualization

• Native property graph model

• Dynamic schema

Agilit

y

• Cypher - Declarative query language

• Procedural language extensions

• Worldwide developer community

Developer Productivity

• 10x less CPU with index-free adjacency

• 10x less hardware than other platforms

Hardware efficiency

Neo4j: Graph Platform

Performance

• Index-free adjacency

• Millions of hops per second

Index-free adjacency ensures lightning-

fast retrieval of data and relationships

Native Graph Architecture

Index free adjacencyUnlike other database models Neo4j

connects data as it is stored

Neo4j Query Planner

Cost based Query Planner since Neo4j

• Uses transactional database statistics

• High performance Query Engine

• Bytecode compiled queries

• Future: Parallism

1

2

3

4

5

6

Architecture Components

Index-Free Adjacency

In memory and on flash/disk

vs

ACID Foundation

Required for safe writes

Full-Stack Clustering

Causal consistencySecurity

Language, Drivers, Tooling

Developer Experience, Graph Efficiency

Graph Engine

Cost-Based Optimizer, Graph Statistics, Cypher Runtime

Hardware Optimizations

For next-gen infrastructure

Neo4j – allows you to connect the dots

• Was built to efficiently

• store,

• query and

• manage highly connected data

• Transactional, ACID• Real-time OLTP• Open source• Highly scalable on few machines

High Query Performance: Some Numbers

• Traverse 2-4M+ relationships per second and core

• Cost based query optimizer –complex queries return in milliseconds

• Import 100K-1M records per second transactionally

• Bulk import tens of billions of records in a few hours

Get Started

Neo4j Sandbox

How do I get it? Desktop – Container – Cloud

http://neo4j.com/download/

docker run neo4j

Neo4j Cluster Deployment Options

• Developer: Neo4j Desktop (free Enterprise License)• On premise – Standalone or via OS package• Containerized with official Docker Image•

In the Cloud• AWS, GCE, Azure

• Using Resource Managers• DC/OS – Marathon

• Kubernetes

• Docker Swarm

10M+

Downloads

3M+ from Neo4j Distribution

7M+ from Docker

Events

400+Approximate Number of

Neo4j Events per Year

50k+

Meetups

Number of Meetup

Members Globally

Active Community

50k+Trained/certified Neo4j

professionals

Trained Developers

Summary: Graphs allow you ...

• Keep your rich data model

• Handle relationships efficiently

• Write queries easily

• Develop applications quickly

• Have fun

Thank You!

Questions?!

@neo4j | neo4j.com@mesirii | Michael Hunger

Users Love Neo4j

Causal Clustering

Core & Replica Servers Causal Consistency

Causal Clustering - Features

• Two Zones – Core + Edge

• Group of Core Servers – Consistent and Partition tolerant (CP)

• Transactional Writes

• Quorum Writes, Cluster Membership, Leader via Raft Consensus

• Scale out with Read Replicas

• Smart Bolt Drivers with

• Routing, Read & Write Sessions

• Causal Consistency with Bookmarks

• For massive query throughput

• Read-only replicas• Not involved in Consensus

Commit • Disposable, suitable for

auto-scaling

Replica

• Small group of Neo4j databases

• Fault-tolerant Consensus Commit

• Responsible for data safety

Core

Writing to the Core Cluster

Neo4j

Driver

Success

Neo4j

Cluster

Application

Server

Neo4j

DriverMax

Jim

Jane

Mar

k

Routed write statements

driver = GraphDatabase.driver( "bolt+routing://aCoreServer" );

try ( Session session = driver.session( AccessMode.WRITE ) )

{

try ( Transaction tx = session.beginTransaction() )

{

tx.run( "MERGE (user:User {userId: {userId}})",

parameters( "userId", userId ) );

tx.success();

}

}

Bookmark

• Session token• String (for portability)• Opaque to application• Represents ultimate user’s most

recent view of the graph• More capabilities to come

Data

Redundancy

Massive

ThroughputHigh

Availability

3.0

Bigger ClustersConsensus

Commit

Built-in load

balancing

3.1Causal

Clusteri

ng

Neo4j 3.0 Neo4j 3.1High Availability

ClusterCausal Cluster

Master-Slave architecture

Paxos consensus used for

master election

Raft protocol used for leader

election, membership changes

and

commitment of all

transactions

Two part cluster: writeable

Core and read-only read

replicas.

Transaction committed

once written durably on

the master

Transaction committed once written

durably on a majority of the core

members

Practical deployments:

10s servers

Practical deployments: 100s

servers

Causal Clustering - Features

• Two Zones – Core + Edge

• Group of Core Servers – Consistent and Partition tolerant (CP)

• Transactional Writes

• Quorum Writes, Cluster Membership, Leader via Raft Consensus

• Scale out with Read Replicas

• Smart Bolt Drivers with

• Routing, Read & Write Sessions

• Causal Consistency with Bookmarks

• For massive query throughput

• Read-only replicas• Not involved in Consensus

Commit • Disposable, suitable for

auto-scaling

Replica

• Small group of Neo4j databases

• Fault-tolerant Consensus Commit

• Responsible for data safety

Core

Writing to the Core Cluster – Raft Consensus

CommitsNeo4j

Driver

Success

Neo4j

Cluster

Application

Server

Neo4j

DriverMax

Jim

Jane

Mar

k

Routed write statements

driver = GraphDatabase.driver( "bolt+routing://aCoreServer" );

try ( Session session = driver.session( AccessMode.WRITE ) )

{

try ( Transaction tx = session.beginTransaction() )

{

tx.run( "MERGE (user:User {userId: {userId}})“, parameters( "userId",

userId ) );

tx.success();

}

}

Bookmark

• Session token• String (for portability)• Opaque to application• Represents ultimate user’s most

recent view of the graph• More capabilities to come

Data

Redundancy

Massive

ThroughputHigh

Availability

3.0

Bigger ClustersConsensus

Commit

Built-in load

balancing

3.1Causal

Clusteri

ng

Flexible Authentication Options

Choose authentication method

• Built-in native users repositoryTesting/POC, single-instance deployments

• LDAP connector to Active Directory or openLDAP

Production deployments

• Custom auth provider plugins

Special deployment scenarios

128

CustomPlugin

Active Directory openLDAP

LDAP connector

LDAP connector

Auth PluginExtension Module

Built-inNative Users

Neo4j

Built-in Native Users

Auth Plugin Extension Module

129

Flexible Authentication OptionsLDAP Group to Role Mapping

dbms.security.ldap.authorization.group_to_role_mapping= \

"CN=Neo4j Read Only,OU=groups,DC=example,DC=com" = reader; \

"CN=Neo4j Read-Write,OU=groups,DC=example,DC=com" = publisher; \

"CN=Neo4j Schema Manager,OU=groups,DC=example,DC=com" = architect; \

"CN=Neo4j Administrator,OU=groups,DC=example,DC=com" = admin; \

"CN=Neo4j Procedures,OU=groups,DC=example,DC=com" = allowed_role./conf/neo4j.conf

CN=Bob Smith

CN=Carl JuniorOU=peopleDC=example

DC=com

BASE DN

OU=groups

CN=Neo4j Read Only

CN=Neo4j Read-Write

CN=Neo4j Schema Manager

CN=Neo4j Administrator

CN=Neo4j Procedures

Map to Neo4j permissions

Use Cases

Case Study: Knowledge Graphs at eBay

Case Study: Knowledge Graphs at eBay

Case Study: Knowledge Graphs at eBay

Case Study: Knowledge Graphs at eBay

Bags

Men’s Backpack

Handbag

Case Study: Knowledge Graphs at eBay

Case studySolving real-time recommendations for the World’s largest retailer.

Challenge

• In its drive to provide the best web experience for its customers, Walmart wanted to optimize its online recommendations.

• Walmart recognized the challenge it faced in delivering recommendations with traditional relational database technology.

• Walmart uses Neo4j to quickly query customers’ past purchases, as well as instantly capture any new interests shown in the customers’ current online visit – essential for making real-time recommendations.

Use of Neo4j

“As the current market leader in graph databases, and with enterprise features for scalability and availability, Neo4j is the right choice to meet our demands”.

- Marcos Vada, Walmart

• With Neo4j, Walmart could substitute a heavy batch process with a simple and real-time graph database.

Result/Outcome

Case studyeBay Now Tackles eCommerce Delivery Service Routing with Neo4j

Challenge

• The queries used to select the best courier for eBays routing system were simply taking too long and they needed a solution to maintain a competitive service.

• The MySQL joins being used created a code base too slow and complex to maintain.

• eBay is now using Neo4j’s graph database platform to redefine e-commerce, by making delivery of online and mobile orders quick and convenient.

Use of Neo4j

• With Neo4j eBay managed to eliminate the biggest roadblock between retailers and online shoppers: the option to have your item delivered the same day.

• The schema-flexible nature of the database allowed easy extensibility, speeding up development.

• Neo4j solution was more than 1000x faster than the prior MySQL Soltution.

Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code.

Result/Outcome

– Volker Pacher, eBay

Top Tier US RetailerCase studySolving Real-time promotions for a top US

retailer

Challenge

• Suffered significant revenues loss, due to legacy infrastructure.

• Particularly challenging when handling transaction volumes on peak shopping occasions such as Thanksgiving and Cyber Monday.

• Neo4j is used to revolutionize and reinvent its real-time promotions engine.

• On an average Neo4j processes 90% of this retailer’s 35M+ daily transactions, each 3-22 hops, in 4ms or less.

Use of Neo4j

• Reached an all time high in online revenues, due to the Neo4j-based friction free solution.

• Neo4j also enabled the company to be one of the first retailers to provide the same promotions across both online and traditional retail channels.

“On an average Neo4j processes 90% of this retailer’s 35M+ daily transactions, each 3-22 hops, in 4ms or less.”

– Top Tier US Retailer

Result/Outcome

Relational DBs Can’t Handle Relationships Well

• Cannot model or store data and relationships without complexity

• Performance degrades with number and levels of relationships, and database size

• Query complexity grows with need for JOINs

• Adding new types of data and relationships requires schema redesign, increasing time to market

… making traditional databases inappropriatewhen data relationships are valuable in real-time

Slow developmentPoor performance

Low scalabilityHard to maintain

Unlocking Value from Your Data Relationships

• Model your data as a graph of data and relationships

• Use relationship information in real-time to transform your business

• Add new relationships on the fly to adapt to your changing business

MATCH (sub)-[:REPORTS_TO*0..3]->(boss),(report)-[:REPORTS_TO*1..3]->(sub)

WHERE boss.name = "Andrew K."RETURN sub.name AS Subordinate, count(report) AS Total

Express Complex Queries Easily with Cypher

Find all direct reports and how many people they manage, up to 3 levels down

Cypher Query

SQL Query

top related