graphs fun vjug2

61
How Graphs make Databases Fun again Value in Relationships Michael Hunger (@mesirii) vJUG July 2015

Upload: neo4j-the-fastest-and-most-scalable-native-graph-database

Post on 14-Aug-2015

299 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graphs fun vjug2

How Graphs make Databases Fun againValue in Relationships

Michael Hunger (@mesirii)vJUG July 2015

Page 2: Graphs fun vjug2

(Michael)-[:HELPS]->(People)-[:WORK_WITH]->(Neo4j)

• Coding•Writing• Speaking• Helping• Connecting• Organizing

Page 3: Graphs fun vjug2

Topics

• Databases are No Fun• Relational Pain -> Graph Fun• The world is a Graph • Neo4j• Model, Query, Import

• Having Fun in the Developer Zone• GitHub Events• Software Analytics• Neo4j from Java

Page 4: Graphs fun vjug2

Databases are No Fun

We have all been there

Page 5: Graphs fun vjug2

What pained me

• Object vs. Database Model = Pain• Hard to Model• Object relational impedance mismatch (and ORMs)

• Schema evolution = DBA Fights• Slow queries = JOIN Pain• Complex Queries = Pages of SQL• Query Optimization = Denormalization• Roundtrips (n+1 select, complex operations) = Cumbersome

Page 6: Graphs fun vjug2

What saved me? – Meeting Emil

Geek Cruise in 2008

Page 7: Graphs fun vjug2

Neo4j: A Story of Pain

With a Happy Ending

Page 8: Graphs fun vjug2

History of Neo4j - Problem

• Digital Asset Management System in 2000• SaaS many users in many countries• Two hard use-cases• Multi language keyword search• Including synonyms / word hierarchies

• Access Management to Assets for SaaS Scale• Groups, Hierarchies, Permissions, Realtime

Page 9: Graphs fun vjug2

History of Neo4j – Relational Attempt

• Tried with many relational DBs• JOIN Performance Problems• Hierarchies, Networks, Graphs

• Modeling Problems• Data Model evolution

• No Success, even …• With expensive database consultants!

Page 10: Graphs fun vjug2

History of Neo4j – First working Implementation

• Graph Model & API sketched on a napkin• Nodes connected by Relationships• Just like your conceptual model

• Implemented network-database in memory• Java API, fast Traversals• Worked well, but …• No persistence, No Transactions• Long import / export time from relational storage

Page 11: Graphs fun vjug2

History of Neo4j - Solution

• Evolved to full fledged database in Java• With persistence using files + memory mapping• Transactions with Transaction Log (WAL)• Lucene for fast entity lookup

• Founded Company in 2007• Neo4j (REST)-Server• Neo4j Clustering & HA• Cypher Query Language

• Today …

Page 12: Graphs fun vjug2

Neo Technology Overview

Product• Neo4j - World’s leading graph

database• 1M+ downloads, adding 70k+

per month• 150+ enterprise subscription

customers including over 50 of the Global 2000

Company• Neo Technology, Creator of Neo4j• 100+ employees with HQ in Silicon

Valley, London, Munich, Paris and Malmö

• $45M in funding from Fidelity, Sunstone, Conor, Creandum, Dawn Capital

Page 13: Graphs fun vjug2

What, Who, Where, How?FinancialServices

Communications

Health &Life

Sciences

HR &Recruiting

Media &Publishing

SocialWeb

Industry & Logistics

Entertainment Consumer Retail Information Services

Business Services

http://neo4j.com/use-cases http://neo4j.com/customers

Page 14: Graphs fun vjug2

Why should I care?

Because Relationships Matter

Page 15: Graphs fun vjug2

What is it with Relationships?

• World is full of connected people, events, things• There is “Value in Relationships” !• What about Data Relationships?• How do you store your object model?• How do you explain

JOIN tables to your boss?

Page 16: Graphs fun vjug2

Neo4j – allows you to connect the dots

• Was built to efficiently • store, • query and • manage highly connected data

• Transactional, ACID• Real-time OLTP• Open source• Highly scalable already on few machines

Page 17: Graphs fun vjug2

Value from Data RelationshipsCommon Use Cases

Internal ApplicationsMaster Data Management

Network and IT Operations

Fraud Detection

Customer-Facing ApplicationsReal-Time Recommendations

Graph-Based SearchIdentity and

Access Management

Page 18: Graphs fun vjug2

Neo4j Browser – Built-in Learning

Page 19: Graphs fun vjug2

RDBMS to Graph – Familiar Examples

Page 20: Graphs fun vjug2

Neo4j Browser – Visualization

Page 21: Graphs fun vjug2

Demo Meetup Import

Page 22: Graphs fun vjug2

Teaser: Meetup.com Import

• For a Meetup Event• Import Attendees• For each Attendee• Import Interests / Topics• Import other Meetup Memberships

• Other groups our members are in• Top 10 topics• Topics & Groups of active Member

https://github.com/ikwattro/meetup2neohttp://markhneedham.com/blog?s=meetup

Page 23: Graphs fun vjug2

From RDBMS to Neo4j

Relational Pains = Graph Pleasure

Page 24: Graphs fun vjug2

Relational DBs Can’t Handle Relationships Well

• Data Model built for tabular forms not JOINS managing connections was bolted on both in schema and query

• Strict schema not suitable for variable structured data which is generated and used by todays applications

• Data volume and JOIN number affect cost of query operation exponentially

• Variable hierarchies and networks are hard to store and query so many “patterns” were developed

… often only denormalization makes complex relational queries fast but destroys the good normalized data-model

Built for FormsJoins are expensiveDenormalize #FTW

Page 25: Graphs fun vjug2

Unlocking Value from Your Data Relationships

• Model your data naturally as a graph of data and relationships

• Drive graph model from domain and use-cases

• Use relationship information in real-time to transform your business

• Add new relationships on the fly to adapt to your changing requirements

Page 26: Graphs fun vjug2

High Query Performance with a Native Graph DB

• Relationships are first class citizen• No need for joins, just follow pre-

materialized relationships of nodes• Query & Data-locality – navigate out

from your starting points• Only load what’s needed• Aggregate and project results as you

go• Optimized disk and memory model for

graphs

Page 27: Graphs fun vjug2

Relational Versus Graph Models

Relational Model Graph Model

KNOWS

KNOWS

KNOWS

ANDREAS

TOBIAS

MICA

DELIA

Person FriendPerson-Friend

ANDREASDELIA

TOBIAS

MICA

Page 28: Graphs fun vjug2

For Instance …

Page 29: Graphs fun vjug2

… Is Actually

Page 30: Graphs fun vjug2

MATCH (boss)-[:MANAGES*0..3]->(mgr), (mgr)-[:MANAGES*1..3]->(report)WHERE boss.name = “John Doe”RETURN mgr.name AS Subordinate, count(report) AS Total

Express Complex Queries Easily with Cypher

Find all direct reports and how many people they manage, each up to 3 levels down

Cypher Query

SQL Query

Page 31: Graphs fun vjug2

High Query Performance: Some Numbers

• Traverse 2-4M+ relationships per second and core

• Cost based query optimizer – complex queries return in milliseconds

• Import 100K-1M records per second transactionally

• Bulk import tens of billions of records in a few hours

Page 32: Graphs fun vjug2

Working with a Graph

Model, Import, Query

Page 33: Graphs fun vjug2

The Whiteboard Model Is the Physical Model

Eliminates Graph-to-Relational MappingIn your data model

Bridge the gap between business

and IT modelsIn your application

Greatly reduce need for application code

Page 34: Graphs fun vjug2

CAR

DRIVES

name: “Dan”born: May 29, 1970

twitter: “@dan”name: “Ann”

born: Dec 5, 1975

since: Jan 10, 2011

brand: “Volvo”model: “V70”

Property Graph Model Components

Nodes• The objects in the graph• Can have name-value properties• Can be labeled

Relationships• Relate nodes by type and direction• Can have name-value properties

LOVES

LOVES

LIVES WITH

OWN

S

PERSON PERSON

Page 35: Graphs fun vjug2

Cypher: Powerful and Expressive Query Language

MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )

LOVES

Dan Ann

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

Page 36: Graphs fun vjug2

Getting Data into Neo4j

Cypher-Based “LOAD CSV” Capability• Transactional (ACID) writes• Initial and incremental loads of up to

10 million nodes and relationships

Command-Line Bulk Loader neo4j-import• For initial database population• For loads up to 10B+ records• Up to 1M records per second

4.58 million thingsand their relationships…

Loads in 100 seconds!

CSV

Page 37: Graphs fun vjug2

Querying Your Data

Page 38: Graphs fun vjug2

Basic Pattern: Tom Hanks Movies?

MATCH (:Person {name:”Tom Hanks"} ) -[:ACTED_IN]-> (:Movie {title:”Forrest Gump"} )

ACTED_IN

Tom Hanks

Forrest Gump

LABEL PROPERTY

NODE NODE

Forrest Gump

LABEL PROPERTY

Page 39: Graphs fun vjug2

Basic Query: Tom Hanks‘ Movies?

MATCH (actor:Person)-[:ACTED_IN]->(m:Movie)

WHERE actor.name = "Tom Hanks"

RETURN *

Page 40: Graphs fun vjug2

Basic Query: Tom Hanks‘ Movies?

Page 41: Graphs fun vjug2

Query Comparison: Colleagues of Tom Hanks?

SELECT *FROM Person as actor JOIN ActorMovie AS am1 ON (actor.id = am1.actor_id) JOIN ActorMovie AS am2 ON (am1.movie_id = am2.movie_id) JOIN Person AS coll ON (coll.id = am2.actor_id)WHERE actor.name = "Tom Hanks“

MATCH (actor:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(coll:Person)WHERE actor.name = "Tom Hanks"RETURN *

Page 42: Graphs fun vjug2

Basic Query Comparison: Colleagues of Tom Hanks?

Page 43: Graphs fun vjug2

Most prolific actors and their filmography?

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)

RETURN p.name, count(*), collect(m.title) as movies

ORDER BY count(*) desc, p.name asc

LIMIT 10;

Page 44: Graphs fun vjug2

Most prolific actors and their filmography?

Page 45: Graphs fun vjug2

Neo4j Query Planner

Cost based Query Planner since Neo4j 2.2• Uses database stats to select best plan• Currently for Read Operations• Query Plan Visualizer, finds• Non optimal queries• Cartesian Product• Missing Indexes, Global Scans• Typos• Massive Fan-Out

Page 46: Graphs fun vjug2

Query Planner

Slight change, add an :Person label -> more stats available -> new plan with fewer database-hits

Page 47: Graphs fun vjug2

Demo Software Analytics

jqassistant.org

Page 48: Graphs fun vjug2

Software Analytics

Software is connected information, aka a graph• Source -> AST• Inheritance, Composition, Delegation• Call Trees• Runtime Memory• Dependencies• Modules, Libraries

• Tests• ...

https://jqassistant.org

Page 49: Graphs fun vjug2

jQAssistant

• GeekCruise: My first Neo4j project• Software deteriorates• Develop rules and enforce them• Commercial Tools too inflexible• Open Source Software ... • Scanner -> Enhancer -> Analyzer

• Enrichment, Concepts and Rulez in Cypher• Scanner Plugins• Integrate in Build Process• Fail, Generate Reports, ... https://jqassistant.org

Page 50: Graphs fun vjug2
Page 51: Graphs fun vjug2

Let‘s explore ...

https://jqassistant.org

... The JDK

Page 52: Graphs fun vjug2

Demo GitHub Events

github.com/ikwattro

Page 53: Graphs fun vjug2

Demo: GitHub Events

• GitHub: Social Coding• GH-Events: • Fork, Comment, PR, ...• Archive + API

• Application for Import:• GitHub ->PHP -> RabbitMQ -> Neo4j

• Data:• 1M Users, 1.4M Repos, 62k Orgs• 13M Events

https://github.com/ikwattro/github2cypherhttps://github.com/ikwattro/github-event

Page 54: Graphs fun vjug2

Model: GitHub Events (partial)

Page 55: Graphs fun vjug2

Event: Watch a Repository

MATCH (w:WatchEvent)WITH w LIMIT 1MATCH p = (w)-[:EVENT_TIME]->(:Minute)<-[:CHILD*]-(:Year), (w)-[:EVENT_ACTOR]->(u:User)-->(r:Repository) <-[:WATCHED_REPOSITORY]-(w)RETURN *;https://twitter.com/ikwattro/status/618431227100532737

Page 56: Graphs fun vjug2

Using Neo4j from Java Choices Galore!

Neo4j.com/developer/java

Page 57: Graphs fun vjug2

Neo4j: OSS, Made in Java

• Java 7 / 8, Scala• High Performance IO, Memory Mapping, • Collections, Caches, Cursors, • Remoting, • Paxos, Raft, ...• Libraries• Jetty, Lucene, Netty, Parboiled, ...

https://github.com/neo4j/neo4j

Page 58: Graphs fun vjug2

Using: Neo4j from Java

• Neo4j Java APIs• Server Extensions• Embedded

• (parallel) Batch-Import APIs• REST / HTTP• JDBC Driver• Neo4j-OGM• Spring Data Neo4j• Upcoming: Binary Protocol Driver

https://neo4j.com/developer/java

Page 59: Graphs fun vjug2

Get up to speed with Neo4jQuickly and Easily

Page 61: Graphs fun vjug2

Thank You!Ask Questions, or Tweet

@neo4j | http://neo4j.com@mesirii | Michael Hunger