graph query languages: update from ldbc

28
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Graph Query Languages Juan F. Sequeda, Ph.D Co-Founder Capsenta 1 Smart Data – Graphorum Conference – January 20, 2017 @juansequeda @ LDBCouncil

Upload: juan-sequeda

Post on 13-Apr-2017

196 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Graph Query Languages

Juan F. Sequeda, Ph.DCo-FounderCapsenta

1Smart  Data  – Graphorum Conference  – January  20,  2017

@juansequeda @  LDBCouncil

Page 2: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Take  away  message

• Graph  Databases  need  a  Standardized  Query  Language

• It’s  complicated• “Those  who  fail  to  learn  from  history  are  doomed  to  repeat  it”

Page 3: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Linked  Data  Benchmark  Council  (LDBC)

• LDBC  is  a  non-­‐profit  organization  dedicated  to  establishing  benchmarks,  benchmark  practices  and  benchmark  results  for  graph  data  management  software.

• LDBC  was  established  as  an  outcome  of  the  LDBC  EU  project  funded  by  the  European  Commission  within  the  7th  Framework  Programme (Grant  Agreement  No.  317548).

http://ldbcouncil.org/

Page 4: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

LDBC  Organization  (non-­‐profit)

“sponsors”

+  non-­‐profit  members  (FORTH,  STI2)  &  personal  members+  Task  Forces,  volunteers  developing  benchmarks+  TUC:  Technical  User  Community  (8  workshops,  ~40  graph  and  

RDF  user  case  studies,  18  vendor  presentations)  

9th  TUC  Meeting,  SAP  Headquarters  in  Walldorf Germany,  February  9-­‐10  2017http://ldbcouncil.org/9th-­‐tuc-­‐meeting-­‐sap-­‐headquarters-­‐walldorf-­‐germany-­‐february-­‐9-­‐10-­‐2017

Page 5: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

LDBC  Benchmarks

5

http://ldbcouncil.org/benchmarks

Graphalytics Semantic  Publishing Social  Network

VLDB  2016

SIGMOD  2015

Page 6: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Graph  Query  Language  Task  Force

• Study  query  languages  for  graph  data  management  systems,  specifically  systems  storing  “Property  Graph”  data

• Query  language  should  cover  the  needs  of  important  use  cases:  social  network  benchmark,  interactive  and  BI  workloads

• Devise  a  list  of  desired  features  and  functionalities  

• Renzo  Angles,  Universidad  de  Talca

• Marcelo  Arenas,  PUC  Chile  -­‐ task  force  lead• Pablo  Barceló,  Universidad  de  Chile

• Peter  Boncz,  Vrije Universiteit Amsterdam

• George  Fletcher,  Eindhoven  University  of  Technology

• Claudio  Gutierrez,  Universidad  de  Chile

• Tobias  Lindaaker,  Neo  Technology• Marcus  Paradies,  SAP

• Raquel  Pau,  UPC

• Arnau Prat,  UPC  /  Sparsity• Juan  Sequeda,  Capsenta

• Oskar  van  Rest,  Oracle  Labs• Hannes  Voigt,  TU  Dresden

• YinglongXia,  IBM

Page 7: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Property  Graph  Data  Model

id:  123name:  Juan  Sequeda

Personid:  456name:  Marcelo  Arenas

Personknows

since:  2010

• Neo4j’s  Cypher• Oracle’s  PGQL• Tinkerpop’sGremlin

Page 8: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Graph  Query  Languages  Today

Page 9: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Standard  Syntax  and  Semantics• Standard– Vendor  lock-­‐in– Prevents  true  performance  competition  à improvement  of  systems

• Syntax– Hard  to  define  benchmarks– Hard  to  compare  Graph  DBMS– Applications  are  not  portable

• Semantics– PGQL:  iso/homomorphism– Cypher:  “edge”  isomorphism– Gremlin:  homomorphism– SPARQL:  homomorphism

Best  Paper  WWW2012

Angles  and  Gutierrez.   The  Expressive   Power  of  SPARQL.  ISWC2008

Best  Paper  ISWC2006.  10  Year  Award  2016  

Page 10: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Closed  Language

SQL

GraphQuery

Language*

however  …

Page 11: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Path  Support

• Conjunctive  Query  (CQ)• Regular  Path  Queries  (RPQ):  regular  expression  over  edge  labels• Conjunctive  Regular  Path  Query  (CRPQ)• Union  Conjunctive  Regular  Path  Query  (UCRPQ)• Path  Matching  supported.  What  about  further  processing  of  Paths?

• Require  for  an  (a,b)*  RPQ  that  the  sum  of  some  property  on  all  b  edges  along  the  path  is  greater  some  give  threshold

Page 12: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Desired  Query  Functionalities

• Adjacency  Queries• Graph  Pattern  Matching• Navigational  Queries• Aggregate  Queries• Sub  queries

Page 13: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Adjacency  queries

• Property  access– Get  the  firstName and  lastName of  a  person  having  email  "$email”

• Neighborhood  of  a  node– Get  the  firstName and  lastName of  the  friends  of  a  person  identified  by  email  "$email".

• K-­‐neighborhood  of  a  node– Get  the  email,  firstName and  lastName of  friends  of  the  friends  of  a  person  having  email  "$email"  (excluding  the  start  person)  (i.e.  get  a  list  of  recommended  friends)  (directed  2-­‐neighborhood)

Page 14: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Graph  Pattern  Matching• Join

– Get  the  creationDate and  content  of  the  messages  created  by  a  person  identified  by  email  "$email1"  and  commented  by  another  person  identified  by  "$email2".

• Union– Get  the  creationDate and  content  of  the  messages  either  created  or  liked  by  a  person  

identified  by  email  "$email".• Intersection

– Get  the  email,   firstNameand  lastName of  the  common  friends  between  two  persons  identified  by  emails  "$email1"  and  "$email2"  respectively.

• Difference– Given  two  friends  identified  by  emails   "$email1"  and  "$email2"  respectively,  get  the  email,  

firstName and  lastName of  the  friends  of  the  second  person  which  are  not  friends  of  the  first  person  (this  questions  is  relevant  for  friendship  recommendations).

• Optional– Given  a  person  identified  by  email  "$email",  get  the  title  of  all  the  messages  created  by  such  

person,  and  the  content  of  the  first  comment  replying  each  message  (if  it  exists).• Filter

– Get  the  properties  of  the  people  whose  firstNameincludes  the  string  "xxx"  (it  implies  use  of  wildcards).

Page 15: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Navigational  queries• Reachability

– Is  there  a  friendship  connection  between  two  persons  identified  by  emails  "$email1"  and  "$email2"  respectively?

• All  Path  Finding– Get  the  friendship  paths  between  two  persons  identified  by  emails  "$email1"  

and  "$email2"  respectively.• Shortest  Path  Finding

– The  shortest  friendship  path  between  two  persons  identified by  emails  "$email1"  and  "$email2"  respectively".

• Regular  Path  Query– Get  the  firstName of  friends  of  the  friends  of  the  friends  of  a  person  identified  

by  email  "$email".• Conjunctive  Regular  Path  Queries

– Given  a  target  message  created  on  "$dateTime"  by  a  person  identified  by  email  "$email",  for  each  comment  replying   the  target  message,  get  the  comment's  content  and  the  email  of  the  comment´s  creator.

• Filtered  regular  path  query– Given  a  person  identified  by  email  "$email",  get  the  title  of  all  the  messages  

liked  by  such  person  between  "$dateTime1"  and  "$dateTime2”.

Page 16: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Where  are  we  now?

Page 17: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Decision:  Property  Graph  Data  Model

In  the  following  definition,  we  assume  the  existence  of  the  following  sets:• L is  an  infinite  set  of  (node  and  edge)  labels;• P is  an infinite  set  of  property  names;• V is  an  infinite  set  of  literals  (actual  values).Moreover,  we  assume  that  SET(X)  is  the  set  of  all  finite  subsets  of  a  given  

set X.Then  a  property  graph  is  a  tupleG =  (N,  E, ρ, λ, σ),  where:• nodes: N is  a  finite  set  of  nodes;• edges:   E is  a  finite  set  of  edges  such  thatN and E have  no  elements  in  

common;ρ : E→ (N × N)  is  a  total  function;

• labels:  λ :  (N  U E) →  SET(L) is  a  total  function;

• properties:  σ :  (N U E) × P→ V is  a  partial  function.

Page 18: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Property  Graph  Data  Model

id:  123name:  Juan  Sequeda

Personid:  456name:  Marcelo  Arenas

Personknows

since:  2010

L is  an  infinite  set  of  (node  and  edge)  labelsP is  an infinite  set  of  property  namesV is  an  infinite  set  of  literals  (actual  values)N is  a  finite  set  of  nodesE is  a  finite  set  of  edges  such  thatN and E have  no  elements  in  common  

L =  {Person,  knows}P =  {id,   name,  since}V =  {“123”,  “456”,  “Juan  Sequeda”,  “Marcelo  Arenas”,  “2010”}N =  {n1,  n2}E =  {e1}

ρ (edge) : E →(N × N)  is  a  total  function

λ (labels):  (N  U E) →  SET(L) is  a  total  function

σ (properties):  (N U E) × P→ V is  a  partial  function

ρ =  [ρ(e1)  =  (n1,n2) ]

λ =  [λ(n1)  =  “Person”,  λ(n2)  =  “Person”,  λ(e1)  =  “knows”  ]

σ =  [σ(n1)  =  {(“id”,  “123”),  (“name”,  “Juan  Sequeda”)},σ(n2)  =  {(“id”,  “456”),  (“name”,  “Marcelo  Arenas”)},σ(e1)  =  {(“since”,  “2010”)}]

Page 19: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Unified  Data  Model  Options

• Extend  Relational  model  (where  cells  can  contain  actual  values  such  as  strings,  integers,  dates,…)  with  objects  of  type  either  NODE,  EDGE,  PATH  or  GRAPH– Embedding  Graphs  in  Relational  Database– Similar  Postgres +  JSON  datatype–What  happens  if  you  join  on  a  GRAPH  column?

Page 20: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Unified  Data  Model  Options

• “Natural”  Graph  Data  model

20

1 2 3 4

5 6

id label E.id E.dest

1 purple 10 2

2 green 11 3

13 5

3 green 12 4

14 6

4 purple no  out edges

5 green 15 6

6 purple 16 4

10 11 12

13 1415 16

Data  graph  G

SELECT x, yFROM G (x:purple)-[e:*]->(y:purple)

Query  1:

id x y

90 1 4

91 1 6

x=1y=4

90x=1y=6

91

id x y E.id E.dest

90 1 4 no  out edges

91 1 6 no  out edges

New  Nodes  Because  ofRelation  Projection

Page 21: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

(1)  Paths  make  things  “weird”!

SELECT x, y, CHEAPEST PATH edgeIdFROM G (x:purple)-[e:*]->(y:purple)

Query  2a:

id x y

90 1 4

91 1 6

id x y E.id E.dest E.edgeId E.order

90 1 4 20 90 10 1

21 90 11 2

22 90 12 3

91 1 6 23 91 10 1

24 91 13 2

25 91 15 3

x=1y=4

90x=1y=6

91edgeId=10order=1

edgeId=11order=2

edgeId=12order=3

edgeId=10order=1

edgeId=13order=2

edgeId=15order=3

1 2 3 4

5 6

10 11 12

13 1415 16

Data  graph  G

Paths  are  Self  Edges  with  an  Order

Page 22: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

(2)  Paths  make  things  “weird”!

22

SELECT x, y, pFROM G (x:purple)-[CHEAPEST p:+]->(y:purple)Query  2b:

id x y

90 1 4

91 1 6

x=1y=4

90x=1y=6

91id x y p E.id E.dest

90 1 4 [10,11,12] no  out edges

91 1 6 [10,13,15] no  out edges

x=1y=4p=[10,11,12]

90x=1y=6p=[10,13,15]

91

1 2 3 4

5 6

10 11 12

13 1415 16

Data  graph  G

Paths  are  Datatypes

Page 23: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

(1)  Graph  Projection  (instead  of  Relation  Projection)

23

Query  3a:

1 2 3 4

5 6

10 11 12

13 1415 16

Data  graph  G

SELECT (x), (y)FROM G (x:orange)-[:+]->(y:orange)

id label E.id E.dest

1 purple no  out edges

4 purple no  out edges

6 purple no  out edges

1 4

6

id x y

90 1 4

91 1 6

x=1y=4

90x=1y=6

91

id x y E.id E.dest

90 1 4 no  out edges

91 1 6 no  out edges

Remember:  this  is  the  Relational  Projection

Page 24: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

(2)  Graph  Projection  (instead  of  Relation  Projection)

24

id label E.id E.dest

1 purple 21 4

22 6

4 purple 23 6

6 purple no  out edges

Query  3b:

1 4

6

21

2322

SELECT (x)--(y)FROM G (x:purple)-[:+]->(y:purple)

1 2 3 4

5 6

10 11 12

13 1415 16

Data  graph  G

Page 25: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

(1)  Graph  Projection  with  Paths

25

1 2 3 4

5 6

10 11 12

13 1415 16

Data  graph  G

SELECT (PATH p)FROM G (x:orange)-[CHEAPEST p:*]->(y:orange)WHERE y.id = 4

1 2 3 4

6

10 11 12

16id label E.id E.dest

1 purple 10 2

2 green 11 3

3 green 12 4

4 purple no  out edges

6 purple 16 6

Query  4a:

Page 26: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

(2)  Graph  Projection  with  Paths

26

1 2 3 4

5 6

10 11 12

13 1415 16

Data  graph  G

Query  4b:

21

23l=1

L=3

SELECT (x)-[:{l=LENGTH OF p}]-(y)FROM G (x:orange)-[CHEAPEST p:*]->(y:orange)WHERE y.id = 4

id label E.id E.dest E.l

1 purple 21 4 34 purple 23 6 16 purple no  out edges

1 4

6

Page 27: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

Wrap  up

• We  want  to  hear  from  you!

• 9th  LDBC  Technical  User  Community  Meeting,  SAP  Headquarters  in  WalldorfGermany,  February  9-­‐10  2017

• http://ldbcouncil.org/9th-­‐tuc-­‐meeting-­‐sap-­‐headquarters-­‐walldorf-­‐germany-­‐february-­‐9-­‐10-­‐2017

27

Page 28: Graph Query Languages: update from LDBC

Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com

THANK  YOU

Juan  Sequeda,  Ph.DCo-­‐Founder  – [email protected]@juansequeda

28

Sequeda  J.  Integrating  Relational  Databases  with  the  Semantic  Web.  IOS  Press.  2016http://www.iospress.nl/book/integrating-­‐relational-­‐databases-­‐with-­‐the-­‐semantic-­‐web/