rdbms to graphs

65
RDBMS to Graphs Harnessing the Power of the Graph September 2015 Ryan Boyd @ryguyrg

Upload: neo4j-the-fastest-and-most-scalable-native-graph-database

Post on 14-Apr-2017

853 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: RDBMS to Graphs

RDBMS  to  Graphs  Harnessing  the  Power  of  the  Graph  

September  2015  

Ryan  Boyd  @ryguyrg  

Page 2: RDBMS to Graphs

Agenda  

•  Origins  of  Neo4j  •  Benefits  of  Graphs  •  Designing  your  Graph  Model  •  Query  <me!  •  Fi@ng  Neo4j  into  your  Enterprise  Architecture    •  Q&A  

Page 3: RDBMS to Graphs

Neo  Technology  Overview  

Product  • Neo4j  -­‐  World’s  leading  graph  database  

• 150+  enterprise  subscrip<on  customers  including  over    50  of  the  Global  2000  

Company  • Neo  Technology,  Creator  of  Neo4j  • 100  employees  with  HQ  in  Silicon  Valley,  London,  Munich,  Paris  and  Malmö  

• $45M  in  funding  

Page 4: RDBMS to Graphs

Neo4j  AdopDon  by  Selected  VerDcals  FinancialServices Communications Health &

Life Sciences HR &

Recruiting Media &

Publishing SocialWeb

Industry & Logistics

Entertainment Consumer Retail Information Services Business Services

Page 5: RDBMS to Graphs

How  Customers  Use  Neo4j  Network &

Data Center Master DataManagement Social Recom–

mendations Identity

& Access Search &Discovery GEO

Page 6: RDBMS to Graphs

“Forrester  es<mates  that  over  25%  of  enterprises  will  be  using  graph  databases  by  2017”  

Neo4j  Leads  the  Graph  Database  RevoluDon  

“Neo4j  is  the  current  market  leader  in  graph  databases.”  

“Graph  analysis  is  possibly  the  single  most  effecDve  compeDDve  differenDator  for  organiza<ons  pursuing  data-­‐driven  opera<ons  and  decisions  aaer  the  design  of  data  capture.”  

IT  Market  Clock  for  Database  Management  Systems,  2014  hbps://www.gartner.com/doc/2852717/it-­‐market-­‐clock-­‐database-­‐management  TechRadar™:  Enterprise  DBMS,  Q1  2014  hbp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-­‐/E-­‐RES106801  Graph  Databases  –  and  Their  PotenDal  to  Transform  How  We  Capture  Interdependencies  (Enterprise  Management  Associates)  hbp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-­‐databasesand-­‐poten<al-­‐transform-­‐capture-­‐interdependencies/  

Page 7: RDBMS to Graphs

High  Business  Value  in  Data  RelaDonships  

Data  is  increasing  in  volume…  •  New  digital  processes  •  More  online  transac<ons  •  New  social  networks  •  More  devices  

Using  Data  RelaDonships  unlocks  value    •  Real-­‐<me  recommenda<ons  •  Fraud  detec<on  •  Master  data  management  •  Network  and  IT  opera<ons  •  Iden<ty  and  access  management  •  Graph-­‐based  search  …  and  is  ge[ng  more  connected  

Customers,  products,  processes,  devices  interact  and  relate  to  each  other    

Early  adopters  became  industry  leaders  

Page 8: RDBMS to Graphs

RelaDonal  DBs  Can’t  Handle  RelaDonships  Well  

•  Cannot  model  or  store  data  and  rela>onships  without  complexity  

•  Performance  degrades  with  number  and  levels  of  rela<onships,  and  database  size  

•  Query  complexity  grows  with  need  for  JOINs  •  Adding  new  types  of    data  and  rela>onships  requires  schema  redesign,  increasing  <me  to  market  

…  making  tradi<onal  databases  inappropriate  when  data  rela<onships  are  valuable  in  real-­‐Dme      

Slow  development  Poor  performance  Low  scalability  Hard  to  maintain  

Page 9: RDBMS to Graphs

Modeling  as  a  Graph  

Page 10: RDBMS to Graphs

The  Whiteboard  Model  Is  the  Physical  Model  

Page 11: RDBMS to Graphs

CAR  

name:  “Dan”  born:  May  29,  1970  twiber:  “@dan”  

name:  “Ann”  born:    Dec  5,  1975  

since:    Jan  10,  2011  

brand:  “Volvo”  model:  “V70”  

Property  Graph  Model  Components  

Nodes  •  The  objects  in  the  graph  •  Can  have  name-­‐value  proper&es  •  Can  be  labeled  RelaDonships  •  Relate  nodes  by  type  and  direc<on  •  Can  have  name-­‐value  proper&es  

LOVES  

LOVES  

LIVES  WITH  PERSON   PERSON  

Page 12: RDBMS to Graphs

RelaDonal  Versus  Graph  Models  

RelaDonal  Model   Graph  Model  

KNOWS  ANDREAS  

TOBIAS  

MICA  

DELIA  

Person   Friend  Person-­‐Friend  

ANDREAS  DELIA  

TOBIAS  

MICA  

Page 13: RDBMS to Graphs

Let’s  Model!  

 

Customer,  Supplier,  and  Product  (Master  Data)  Orders  (AcDvity)  

Page 14: RDBMS to Graphs

The  Domain  Model  

Page 15: RDBMS to Graphs

Except…  

Page 16: RDBMS to Graphs

Northwind  Example!    

Page 17: RDBMS to Graphs

The  QuintessenDal  Northwind  Example!  

 

NOT  JUST  ANY  

Page 18: RDBMS to Graphs

(Northwind)-­‐[:TO]-­‐>(Graph)  Building  the  Graph  Model  

Page 19: RDBMS to Graphs

Building  RelaDonships  in  Graphs  

SOLD  

Employee   Order  Order  

Page 20: RDBMS to Graphs

Locate  Foreign  Keys  

Page 21: RDBMS to Graphs

(FKs)-­‐[:BECOME]-­‐>(RelaDonships)  Correct  DirecDons  

Page 22: RDBMS to Graphs

Simple  Join  Tables  Becomes  RelaDonships  

Page 23: RDBMS to Graphs

Afributed  Join  Tables  Become  RelaDonships  with  ProperDes  

Page 24: RDBMS to Graphs

Working  Subset  (Today’s  Exercise)  

Page 25: RDBMS to Graphs

Northwind  Graph  Model  

Page 26: RDBMS to Graphs

Querying  Your  Data  

Page 27: RDBMS to Graphs
Page 28: RDBMS to Graphs
Page 29: RDBMS to Graphs

Basic  Query:  Who  do  people  report  to?  

MATCH  (:Employee{  firstName:“Steven”}  )  -­‐[:REPORTS_TO]-­‐>  (:Employee{  firstName:“Andrew”}  )    

REPORTS_TO  Steven   Andrew  

LABEL   PROPERTY  

NODE   NODE  

LABEL   PROPERTY  

Page 30: RDBMS to Graphs

Basic  Query:  Who  do  people  report  to?  

MATCH ! (e:Employee)<-[:REPORTS_TO]-(sub:Employee)!RETURN ! *!

Page 31: RDBMS to Graphs

Basic  Query:  Who  do  people  report  to?  

Page 32: RDBMS to Graphs

Basic  Query:  Who  do  people  report  to?  

Page 33: RDBMS to Graphs

Real  Query  from  a  Customer  

Find  all  direct  reports  and    how  many  people  they  manage,    

each  up  to  3  levels  down  

Page 34: RDBMS to Graphs

(SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.directly_manages AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager  

JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT depth1Reportees.pid AS directReportees, count(depth2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count OUTER UNIONS FROM( SELECT reportee.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT L2Reportees.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid

JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid

WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") )

Page 35: RDBMS to Graphs

Real  Query  from  a  Customer  

MATCH  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)  WHERE  boss.name  =  “John  Doe”  RETURN  manager.name  AS  Manager,        count(report)  AS  TotalReports  

Find  all  direct  reports  and  how  many  people  they  manage,    up  to  3  levels  down  

Cypher  Query  

Page 36: RDBMS to Graphs

Real  Query  from  a  Customer  

Find  all  direct  reports  and  how  many  people  they  manage,    up  to  3  levels  down  

Cypher  Query  

SQL  Query  

MATCH  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)  WHERE  boss.name  =  “John  Doe”  RETURN  manager.name  AS  Manager,        count(report)  AS  TotalReports  

Page 37: RDBMS to Graphs

MATCH  (sub)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(sub)  WHERE  boss.name  =  “John  Doe”  RETURN  sub.name  AS  Subordinate,        count(report)  AS  Total  

Express  Complex  Queries  Easily  with  Cypher  

Find  all  direct  reports  and  how  many  people  they  manage,    up  to  3  levels  down  

Cypher  Query  

SQL  Query  

Page 38: RDBMS to Graphs

“We  found  Neo4j  to  be  literally  thousands  of  Dmes  faster  than  our  prior  MySQL  solu<on,  with  queries  that  require  10  to  100  Dmes  less  code.  Today,  Neo4j  provides  eBay  with  func<onality  that  was  previously  impossible.”    Volker  Pacher  Senior  Developer  

Page 39: RDBMS to Graphs

Who  is  in  Robert’s  (direct,  upwards)  reporDng  chain?  

MATCH ! p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)!WHERE! sub.firstName = ‘Robert’!RETURN ! p!

Page 40: RDBMS to Graphs

Who  is  in  Robert’s  (direct,  upwards)  reporDng  chain?  

Page 41: RDBMS to Graphs

Who’s  the  Big  Boss?  

MATCH ! p=(e:Employee)!WHERE! NOT (e)-[:REPORTS_TO]->()!RETURN ! e.firstName as bigBoss!

Page 42: RDBMS to Graphs

Who’s  the  Big  Boss?  

Page 43: RDBMS to Graphs

Product  Cross-­‐Sell  MATCH ! (choc:Product {productName: 'Chocolade'})! <-[:PRODUCT]-(:Order)<-[:SOLD]-(employee),! (employee)-[:SOLD]->(o2)-[:PRODUCT]->(other:Product)!RETURN ! employee.firstName, other.productName, count(distinct o2) as count!ORDER BY ! count DESC!LIMIT 5;!

Page 44: RDBMS to Graphs

Product  Cross-­‐Sell  

Page 45: RDBMS to Graphs

High  Performance    

Page 46: RDBMS to Graphs

Cypher  vs  SQL  -­‐  Paths  

MATCH (u:User)-[:KNOWS*5..5]->(f5) WHERE u.name = 'John' RETURN count(f5) as size;

Cypher  Find  Size  of  John’s  5th  degree  Network  

●  100k  Users  ●  5M  

Rela<onships  ●  Query  took  5  

min,  30s  ●  Returns  count  of  

312M    Neo4j  config:    page-­‐cache  =  512m  heap  =  4G  

Page 47: RDBMS to Graphs

Cypher  vs  SQL  -­‐  Paths  

SELECT count(*) FROM user, user_friend as uf1, user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5 user as f5 WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1 AND uf5.user_2 = f5.id;

SQL  Find  Size  of  John’s  5th  degree  Network  

●  100k  Users  ●  5M  Connec<ons  ●  Query  took  1hr  55  mins  ●  Returns  312M  

 MySQL  config:    key_buffer  =  2G  join_buffer_size  =  2G  

Page 48: RDBMS to Graphs

Cypher  vs  SQL  -­‐  Paths    

SELECT count(*) FROM user, user_friend as uf1,

user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5

WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1;

SQL  Op>mize:  Only  count  on  JOIN  table  

●  100k  Users  ●  5M  Connec<ons  ●  Query  took  2  min,  30s  ●  Returns  count  of  312M  

 MySQL  config:    key_buffer  =  2G  join_buffer_size  =  2G  

Page 49: RDBMS to Graphs

Cypher  vs  SQL  -­‐  Paths  

MATCH (u:User)-[:KNOWS*4..4]->(f4) WHERE u.name = 'John' RETURN sum(size((f4)-[:KNOWS]->()))

Cypher  Op>mize:  Only  sum  degree  of  last  step  

●  100k  Users  ●  5M  

Rela<onships  ●  Query  takes  12  

sec  ●  Returns  count  of  

312M    Neo4j  config:    page-­‐cache  =  512m  heap  =  4G  

Page 50: RDBMS to Graphs

Neo4j  Clustering    Architecture  OpDmized  for  Speed  &  Availability  at  Scale  

50

Performance  Benefits  •  No  network  hops  within  queries  •  Real-­‐>me  opera>ons  with  fast  and  consistent  response  <mes    

•  Cache  sharding  spreads  cache  across  cluster  for  very  large  graphs  

Clustering  Features  •  Master-­‐slave  replica<on  with    master  re-­‐elec>on  and  failover    

•  Each  instance  has  its  own  local  cache  •  Horizontal  scaling  &  disaster  recovery  

Load  Balancer  

Neo4j  Neo4j  Neo4j  

Page 51: RDBMS to Graphs

Ge[ng  Data  into  Neo4j  

Cypher-­‐Based  “LOAD  CSV”  Capability  •  Transac<onal  (ACID)  writes  •  Ini<al  and  incremental  loads  of  up  to    10  million  nodes  and  rela<onships  

Command-­‐Line  Bulk  Loader        neo4j-­‐import  •  For  ini<al  database  popula<on  •  For  loads  with  10B+  records  •  Up  to  1M  records  per  second  

 4.58  million  things  and  their  rela<onships…  

 Loads  in  100  seconds!  

Page 52: RDBMS to Graphs

MIGRATE    ALL  DATA  

MIGRATE    GRAPH  DATA  

DUPLICATE  GRAPH  DATA  

Non-­‐graph  data   Graph  data  

Graph  data  All  data  

All  data  

RelaDonal  Database  

Graph  Database  

Applica<on  

Applica<on  

Applica<on  

Three  Ways  to  Load  Data  into  Neo4j  

Page 53: RDBMS to Graphs

Polyglot  Persistence    

Page 54: RDBMS to Graphs

Data  Storage  and  Business  Rules  Execu<on  

Data  Mining    and  Aggrega<on  

Neo4j  Fits  into  Your  Enterprise  Environment  

ApplicaDon  

Graph  Database  Cluster  

Neo4j   Neo4j   Neo4j  

Ad  Hoc  Analysis  

Bulk  AnalyDc  Infrastructure  

Graph  Compute  Engine  EDW      …  

Data  ScienDst  

End  User  

Databases  Rela<onal  NoSQL  Hadoop  

Page 55: RDBMS to Graphs

Neo4j  +  Mongo!  

Page 56: RDBMS to Graphs

Users  Love  Neo4j  

Page 57: RDBMS to Graphs

Users  Love  Neo4j  

Page 58: RDBMS to Graphs

Learn  the  Way  of  the  Graph  Quickly  and  Easily  

Page 59: RDBMS to Graphs

Quick  Start  in  1  minute  

Page 60: RDBMS to Graphs

Quick  Start:  Plan  Your  Project  

1  

2  

3  

4  

5  

6  

7  

8  

Learn  Neo4j  

Decide  on  Architecture  

Import  and  Model  Data  

Build  ApplicaDon  

Test  ApplicaDon  

Deploy  your  app  in  as  lible  as  8  weeks  

PROFESSIONAL  SERVICES  PLAN  

Page 61: RDBMS to Graphs

There  Are  Lots  of  Ways  to  Easily  Learn  Neo4j  

Page 62: RDBMS to Graphs

Huge  Ecosystem  of  Graph  Enthusiasts  

•  1,000,000+  downloads  •  20,000+  educa<on  registrants  •  18,000+  Meetup  members  •  100+  technology  and  service  partners  •  150+  enterprise  subscrip<on  customers    including  50+  Global  2000  companies  

Page 63: RDBMS to Graphs

Get  Started  Now  

Page 64: RDBMS to Graphs

Summary  of  the  Power  of  the  Graph  

•  Take  rela<onships  and  connected  data  seriously  •  Seriously  easy  to  model    •  Serious  performance    

•  Fits  in  with  your  Enterprise  Architecture  •  Easy  to  get  started  •  Fast  to  reap  the  benefits  

Page 65: RDBMS to Graphs

RDBMS  to  Graphs  Harnessing  the  Power  of  the  Graph  

Start  of  Q&A  

Ryan  Boyd  @ryguyrg