rdbms to graphs
TRANSCRIPT
RDBMS to Graphs Harnessing the Power of the Graph
September 2015
Ryan Boyd @ryguyrg
Agenda
• Origins of Neo4j • Benefits of Graphs • Designing your Graph Model • Query <me! • Fi@ng Neo4j into your Enterprise Architecture • Q&A
Neo Technology Overview
Product • Neo4j -‐ World’s leading graph database
• 150+ enterprise subscrip<on customers including over 50 of the Global 2000
Company • Neo Technology, Creator of Neo4j • 100 employees with HQ in Silicon Valley, London, Munich, Paris and Malmö
• $45M in funding
Neo4j AdopDon by Selected VerDcals FinancialServices Communications Health &
Life Sciences HR &
Recruiting Media &
Publishing SocialWeb
Industry & Logistics
Entertainment Consumer Retail Information Services Business Services
How Customers Use Neo4j Network &
Data Center Master DataManagement Social Recom–
mendations Identity
& Access Search &Discovery GEO
“Forrester es<mates that over 25% of enterprises will be using graph databases by 2017”
Neo4j Leads the Graph Database RevoluDon
“Neo4j is the current market leader in graph databases.”
“Graph analysis is possibly the single most effecDve compeDDve differenDator for organiza<ons pursuing data-‐driven opera<ons and decisions aaer the design of data capture.”
IT Market Clock for Database Management Systems, 2014 hbps://www.gartner.com/doc/2852717/it-‐market-‐clock-‐database-‐management TechRadar™: Enterprise DBMS, Q1 2014 hbp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-‐/E-‐RES106801 Graph Databases – and Their PotenDal to Transform How We Capture Interdependencies (Enterprise Management Associates) hbp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-‐databasesand-‐poten<al-‐transform-‐capture-‐interdependencies/
High Business Value in Data RelaDonships
Data is increasing in volume… • New digital processes • More online transac<ons • New social networks • More devices
Using Data RelaDonships unlocks value • Real-‐<me recommenda<ons • Fraud detec<on • Master data management • Network and IT opera<ons • Iden<ty and access management • Graph-‐based search … and is ge[ng more connected
Customers, products, processes, devices interact and relate to each other
Early adopters became industry leaders
RelaDonal DBs Can’t Handle RelaDonships Well
• Cannot model or store data and rela>onships without complexity
• Performance degrades with number and levels of rela<onships, and database size
• Query complexity grows with need for JOINs • Adding new types of data and rela>onships requires schema redesign, increasing <me to market
… making tradi<onal databases inappropriate when data rela<onships are valuable in real-‐Dme
Slow development Poor performance Low scalability Hard to maintain
Modeling as a Graph
The Whiteboard Model Is the Physical Model
CAR
name: “Dan” born: May 29, 1970 twiber: “@dan”
name: “Ann” born: Dec 5, 1975
since: Jan 10, 2011
brand: “Volvo” model: “V70”
Property Graph Model Components
Nodes • The objects in the graph • Can have name-‐value proper&es • Can be labeled RelaDonships • Relate nodes by type and direc<on • Can have name-‐value proper&es
LOVES
LOVES
LIVES WITH PERSON PERSON
RelaDonal Versus Graph Models
RelaDonal Model Graph Model
KNOWS ANDREAS
TOBIAS
MICA
DELIA
Person Friend Person-‐Friend
ANDREAS DELIA
TOBIAS
MICA
Let’s Model!
Customer, Supplier, and Product (Master Data) Orders (AcDvity)
The Domain Model
Except…
Northwind Example!
The QuintessenDal Northwind Example!
NOT JUST ANY
(Northwind)-‐[:TO]-‐>(Graph) Building the Graph Model
Building RelaDonships in Graphs
SOLD
Employee Order Order
Locate Foreign Keys
(FKs)-‐[:BECOME]-‐>(RelaDonships) Correct DirecDons
Simple Join Tables Becomes RelaDonships
Afributed Join Tables Become RelaDonships with ProperDes
Working Subset (Today’s Exercise)
Northwind Graph Model
Querying Your Data
Basic Query: Who do people report to?
MATCH (:Employee{ firstName:“Steven”} ) -‐[:REPORTS_TO]-‐> (:Employee{ firstName:“Andrew”} )
REPORTS_TO Steven Andrew
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
Basic Query: Who do people report to?
MATCH ! (e:Employee)<-[:REPORTS_TO]-(sub:Employee)!RETURN ! *!
Basic Query: Who do people report to?
Basic Query: Who do people report to?
Real Query from a Customer
Find all direct reports and how many people they manage,
each up to 3 levels down
(SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.directly_manages AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager
JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT depth1Reportees.pid AS directReportees, count(depth2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count OUTER UNIONS FROM( SELECT reportee.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT L2Reportees.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") )
Real Query from a Customer
MATCH (manager)-‐[:REPORTS_TO*0..3]-‐>(boss), (report)-‐[:REPORTS_TO*1..3]-‐>(manager) WHERE boss.name = “John Doe” RETURN manager.name AS Manager, count(report) AS TotalReports
Find all direct reports and how many people they manage, up to 3 levels down
Cypher Query
Real Query from a Customer
Find all direct reports and how many people they manage, up to 3 levels down
Cypher Query
SQL Query
MATCH (manager)-‐[:REPORTS_TO*0..3]-‐>(boss), (report)-‐[:REPORTS_TO*1..3]-‐>(manager) WHERE boss.name = “John Doe” RETURN manager.name AS Manager, count(report) AS TotalReports
MATCH (sub)-‐[:REPORTS_TO*0..3]-‐>(boss), (report)-‐[:REPORTS_TO*1..3]-‐>(sub) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total
Express Complex Queries Easily with Cypher
Find all direct reports and how many people they manage, up to 3 levels down
Cypher Query
SQL Query
“We found Neo4j to be literally thousands of Dmes faster than our prior MySQL solu<on, with queries that require 10 to 100 Dmes less code. Today, Neo4j provides eBay with func<onality that was previously impossible.” Volker Pacher Senior Developer
Who is in Robert’s (direct, upwards) reporDng chain?
MATCH ! p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)!WHERE! sub.firstName = ‘Robert’!RETURN ! p!
Who is in Robert’s (direct, upwards) reporDng chain?
Who’s the Big Boss?
MATCH ! p=(e:Employee)!WHERE! NOT (e)-[:REPORTS_TO]->()!RETURN ! e.firstName as bigBoss!
Who’s the Big Boss?
Product Cross-‐Sell MATCH ! (choc:Product {productName: 'Chocolade'})! <-[:PRODUCT]-(:Order)<-[:SOLD]-(employee),! (employee)-[:SOLD]->(o2)-[:PRODUCT]->(other:Product)!RETURN ! employee.firstName, other.productName, count(distinct o2) as count!ORDER BY ! count DESC!LIMIT 5;!
Product Cross-‐Sell
High Performance
Cypher vs SQL -‐ Paths
MATCH (u:User)-[:KNOWS*5..5]->(f5) WHERE u.name = 'John' RETURN count(f5) as size;
Cypher Find Size of John’s 5th degree Network
● 100k Users ● 5M
Rela<onships ● Query took 5
min, 30s ● Returns count of
312M Neo4j config: page-‐cache = 512m heap = 4G
Cypher vs SQL -‐ Paths
SELECT count(*) FROM user, user_friend as uf1, user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5 user as f5 WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1 AND uf5.user_2 = f5.id;
SQL Find Size of John’s 5th degree Network
● 100k Users ● 5M Connec<ons ● Query took 1hr 55 mins ● Returns 312M
MySQL config: key_buffer = 2G join_buffer_size = 2G
Cypher vs SQL -‐ Paths
SELECT count(*) FROM user, user_friend as uf1,
user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5
WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1;
SQL Op>mize: Only count on JOIN table
● 100k Users ● 5M Connec<ons ● Query took 2 min, 30s ● Returns count of 312M
MySQL config: key_buffer = 2G join_buffer_size = 2G
Cypher vs SQL -‐ Paths
MATCH (u:User)-[:KNOWS*4..4]->(f4) WHERE u.name = 'John' RETURN sum(size((f4)-[:KNOWS]->()))
Cypher Op>mize: Only sum degree of last step
● 100k Users ● 5M
Rela<onships ● Query takes 12
sec ● Returns count of
312M Neo4j config: page-‐cache = 512m heap = 4G
Neo4j Clustering Architecture OpDmized for Speed & Availability at Scale
50
Performance Benefits • No network hops within queries • Real-‐>me opera>ons with fast and consistent response <mes
• Cache sharding spreads cache across cluster for very large graphs
Clustering Features • Master-‐slave replica<on with master re-‐elec>on and failover
• Each instance has its own local cache • Horizontal scaling & disaster recovery
Load Balancer
Neo4j Neo4j Neo4j
Ge[ng Data into Neo4j
Cypher-‐Based “LOAD CSV” Capability • Transac<onal (ACID) writes • Ini<al and incremental loads of up to 10 million nodes and rela<onships
Command-‐Line Bulk Loader neo4j-‐import • For ini<al database popula<on • For loads with 10B+ records • Up to 1M records per second
4.58 million things and their rela<onships…
Loads in 100 seconds!
MIGRATE ALL DATA
MIGRATE GRAPH DATA
DUPLICATE GRAPH DATA
Non-‐graph data Graph data
Graph data All data
All data
RelaDonal Database
Graph Database
Applica<on
Applica<on
Applica<on
Three Ways to Load Data into Neo4j
Polyglot Persistence
Data Storage and Business Rules Execu<on
Data Mining and Aggrega<on
Neo4j Fits into Your Enterprise Environment
ApplicaDon
Graph Database Cluster
Neo4j Neo4j Neo4j
Ad Hoc Analysis
Bulk AnalyDc Infrastructure
Graph Compute Engine EDW …
Data ScienDst
End User
Databases Rela<onal NoSQL Hadoop
Neo4j + Mongo!
Users Love Neo4j
Users Love Neo4j
Learn the Way of the Graph Quickly and Easily
Quick Start in 1 minute
Quick Start: Plan Your Project
1
2
3
4
5
6
7
8
Learn Neo4j
Decide on Architecture
Import and Model Data
Build ApplicaDon
Test ApplicaDon
Deploy your app in as lible as 8 weeks
PROFESSIONAL SERVICES PLAN
There Are Lots of Ways to Easily Learn Neo4j
Huge Ecosystem of Graph Enthusiasts
• 1,000,000+ downloads • 20,000+ educa<on registrants • 18,000+ Meetup members • 100+ technology and service partners • 150+ enterprise subscrip<on customers including 50+ Global 2000 companies
Get Started Now
Summary of the Power of the Graph
• Take rela<onships and connected data seriously • Seriously easy to model • Serious performance
• Fits in with your Enterprise Architecture • Easy to get started • Fast to reap the benefits
RDBMS to Graphs Harnessing the Power of the Graph
Start of Q&A
Ryan Boyd @ryguyrg