graphs fun vjug2
TRANSCRIPT
How Graphs make Databases Fun againValue in Relationships
Michael Hunger (@mesirii)vJUG July 2015
(Michael)-[:HELPS]->(People)-[:WORK_WITH]->(Neo4j)
• Coding•Writing• Speaking• Helping• Connecting• Organizing
Topics
• Databases are No Fun• Relational Pain -> Graph Fun• The world is a Graph • Neo4j• Model, Query, Import
• Having Fun in the Developer Zone• GitHub Events• Software Analytics• Neo4j from Java
Databases are No Fun
We have all been there
What pained me
• Object vs. Database Model = Pain• Hard to Model• Object relational impedance mismatch (and ORMs)
• Schema evolution = DBA Fights• Slow queries = JOIN Pain• Complex Queries = Pages of SQL• Query Optimization = Denormalization• Roundtrips (n+1 select, complex operations) = Cumbersome
What saved me? – Meeting Emil
Geek Cruise in 2008
Neo4j: A Story of Pain
With a Happy Ending
History of Neo4j - Problem
• Digital Asset Management System in 2000• SaaS many users in many countries• Two hard use-cases• Multi language keyword search• Including synonyms / word hierarchies
• Access Management to Assets for SaaS Scale• Groups, Hierarchies, Permissions, Realtime
History of Neo4j – Relational Attempt
• Tried with many relational DBs• JOIN Performance Problems• Hierarchies, Networks, Graphs
• Modeling Problems• Data Model evolution
• No Success, even …• With expensive database consultants!
History of Neo4j – First working Implementation
• Graph Model & API sketched on a napkin• Nodes connected by Relationships• Just like your conceptual model
• Implemented network-database in memory• Java API, fast Traversals• Worked well, but …• No persistence, No Transactions• Long import / export time from relational storage
History of Neo4j - Solution
• Evolved to full fledged database in Java• With persistence using files + memory mapping• Transactions with Transaction Log (WAL)• Lucene for fast entity lookup
• Founded Company in 2007• Neo4j (REST)-Server• Neo4j Clustering & HA• Cypher Query Language
• Today …
Neo Technology Overview
Product• Neo4j - World’s leading graph
database• 1M+ downloads, adding 70k+
per month• 150+ enterprise subscription
customers including over 50 of the Global 2000
Company• Neo Technology, Creator of Neo4j• 100+ employees with HQ in Silicon
Valley, London, Munich, Paris and Malmö
• $45M in funding from Fidelity, Sunstone, Conor, Creandum, Dawn Capital
What, Who, Where, How?FinancialServices
Communications
Health &Life
Sciences
HR &Recruiting
Media &Publishing
SocialWeb
Industry & Logistics
Entertainment Consumer Retail Information Services
Business Services
http://neo4j.com/use-cases http://neo4j.com/customers
Why should I care?
Because Relationships Matter
What is it with Relationships?
• World is full of connected people, events, things• There is “Value in Relationships” !• What about Data Relationships?• How do you store your object model?• How do you explain
JOIN tables to your boss?
Neo4j – allows you to connect the dots
• Was built to efficiently • store, • query and • manage highly connected data
• Transactional, ACID• Real-time OLTP• Open source• Highly scalable already on few machines
Value from Data RelationshipsCommon Use Cases
Internal ApplicationsMaster Data Management
Network and IT Operations
Fraud Detection
Customer-Facing ApplicationsReal-Time Recommendations
Graph-Based SearchIdentity and
Access Management
Neo4j Browser – Built-in Learning
RDBMS to Graph – Familiar Examples
Neo4j Browser – Visualization
Demo Meetup Import
Teaser: Meetup.com Import
• For a Meetup Event• Import Attendees• For each Attendee• Import Interests / Topics• Import other Meetup Memberships
• Other groups our members are in• Top 10 topics• Topics & Groups of active Member
https://github.com/ikwattro/meetup2neohttp://markhneedham.com/blog?s=meetup
From RDBMS to Neo4j
Relational Pains = Graph Pleasure
Relational DBs Can’t Handle Relationships Well
• Data Model built for tabular forms not JOINS managing connections was bolted on both in schema and query
• Strict schema not suitable for variable structured data which is generated and used by todays applications
• Data volume and JOIN number affect cost of query operation exponentially
• Variable hierarchies and networks are hard to store and query so many “patterns” were developed
… often only denormalization makes complex relational queries fast but destroys the good normalized data-model
Built for FormsJoins are expensiveDenormalize #FTW
Unlocking Value from Your Data Relationships
• Model your data naturally as a graph of data and relationships
• Drive graph model from domain and use-cases
• Use relationship information in real-time to transform your business
• Add new relationships on the fly to adapt to your changing requirements
High Query Performance with a Native Graph DB
• Relationships are first class citizen• No need for joins, just follow pre-
materialized relationships of nodes• Query & Data-locality – navigate out
from your starting points• Only load what’s needed• Aggregate and project results as you
go• Optimized disk and memory model for
graphs
Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREASDELIA
TOBIAS
MICA
For Instance …
… Is Actually
MATCH (boss)-[:MANAGES*0..3]->(mgr), (mgr)-[:MANAGES*1..3]->(report)WHERE boss.name = “John Doe”RETURN mgr.name AS Subordinate, count(report) AS Total
Express Complex Queries Easily with Cypher
Find all direct reports and how many people they manage, each up to 3 levels down
Cypher Query
SQL Query
High Query Performance: Some Numbers
• Traverse 2-4M+ relationships per second and core
• Cost based query optimizer – complex queries return in milliseconds
• Import 100K-1M records per second transactionally
• Bulk import tens of billions of records in a few hours
Working with a Graph
Model, Import, Query
The Whiteboard Model Is the Physical Model
Eliminates Graph-to-Relational MappingIn your data model
Bridge the gap between business
and IT modelsIn your application
Greatly reduce need for application code
CAR
DRIVES
name: “Dan”born: May 29, 1970
twitter: “@dan”name: “Ann”
born: Dec 5, 1975
since: Jan 10, 2011
brand: “Volvo”model: “V70”
Property Graph Model Components
Nodes• The objects in the graph• Can have name-value properties• Can be labeled
Relationships• Relate nodes by type and direction• Can have name-value properties
LOVES
LOVES
LIVES WITH
OWN
S
PERSON PERSON
Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
Getting Data into Neo4j
Cypher-Based “LOAD CSV” Capability• Transactional (ACID) writes• Initial and incremental loads of up to
10 million nodes and relationships
Command-Line Bulk Loader neo4j-import• For initial database population• For loads up to 10B+ records• Up to 1M records per second
4.58 million thingsand their relationships…
Loads in 100 seconds!
CSV
Querying Your Data
Basic Pattern: Tom Hanks Movies?
MATCH (:Person {name:”Tom Hanks"} ) -[:ACTED_IN]-> (:Movie {title:”Forrest Gump"} )
ACTED_IN
Tom Hanks
Forrest Gump
LABEL PROPERTY
NODE NODE
Forrest Gump
LABEL PROPERTY
Basic Query: Tom Hanks‘ Movies?
MATCH (actor:Person)-[:ACTED_IN]->(m:Movie)
WHERE actor.name = "Tom Hanks"
RETURN *
Basic Query: Tom Hanks‘ Movies?
Query Comparison: Colleagues of Tom Hanks?
SELECT *FROM Person as actor JOIN ActorMovie AS am1 ON (actor.id = am1.actor_id) JOIN ActorMovie AS am2 ON (am1.movie_id = am2.movie_id) JOIN Person AS coll ON (coll.id = am2.actor_id)WHERE actor.name = "Tom Hanks“
MATCH (actor:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(coll:Person)WHERE actor.name = "Tom Hanks"RETURN *
Basic Query Comparison: Colleagues of Tom Hanks?
Most prolific actors and their filmography?
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, count(*), collect(m.title) as movies
ORDER BY count(*) desc, p.name asc
LIMIT 10;
Most prolific actors and their filmography?
Neo4j Query Planner
Cost based Query Planner since Neo4j 2.2• Uses database stats to select best plan• Currently for Read Operations• Query Plan Visualizer, finds• Non optimal queries• Cartesian Product• Missing Indexes, Global Scans• Typos• Massive Fan-Out
Query Planner
Slight change, add an :Person label -> more stats available -> new plan with fewer database-hits
Demo Software Analytics
jqassistant.org
Software Analytics
Software is connected information, aka a graph• Source -> AST• Inheritance, Composition, Delegation• Call Trees• Runtime Memory• Dependencies• Modules, Libraries
• Tests• ...
https://jqassistant.org
jQAssistant
• GeekCruise: My first Neo4j project• Software deteriorates• Develop rules and enforce them• Commercial Tools too inflexible• Open Source Software ... • Scanner -> Enhancer -> Analyzer
• Enrichment, Concepts and Rulez in Cypher• Scanner Plugins• Integrate in Build Process• Fail, Generate Reports, ... https://jqassistant.org
Let‘s explore ...
https://jqassistant.org
... The JDK
Demo GitHub Events
github.com/ikwattro
Demo: GitHub Events
• GitHub: Social Coding• GH-Events: • Fork, Comment, PR, ...• Archive + API
• Application for Import:• GitHub ->PHP -> RabbitMQ -> Neo4j
• Data:• 1M Users, 1.4M Repos, 62k Orgs• 13M Events
https://github.com/ikwattro/github2cypherhttps://github.com/ikwattro/github-event
Model: GitHub Events (partial)
Event: Watch a Repository
MATCH (w:WatchEvent)WITH w LIMIT 1MATCH p = (w)-[:EVENT_TIME]->(:Minute)<-[:CHILD*]-(:Year), (w)-[:EVENT_ACTOR]->(u:User)-->(r:Repository) <-[:WATCHED_REPOSITORY]-(w)RETURN *;https://twitter.com/ikwattro/status/618431227100532737
Using Neo4j from Java Choices Galore!
Neo4j.com/developer/java
Neo4j: OSS, Made in Java
• Java 7 / 8, Scala• High Performance IO, Memory Mapping, • Collections, Caches, Cursors, • Remoting, • Paxos, Raft, ...• Libraries• Jetty, Lucene, Netty, Parboiled, ...
https://github.com/neo4j/neo4j
Using: Neo4j from Java
• Neo4j Java APIs• Server Extensions• Embedded
• (parallel) Batch-Import APIs• REST / HTTP• JDBC Driver• Neo4j-OGM• Spring Data Neo4j• Upcoming: Binary Protocol Driver
https://neo4j.com/developer/java
Get up to speed with Neo4jQuickly and Easily
There Are Lots of Ways to Easily Learn Neo4j
neo4j.com/developer
Thank You!Ask Questions, or Tweet
@neo4j | http://neo4j.com@mesirii | Michael Hunger