datastax: what's new in apache tinkerpop - the graph computing framework

47
What’s New in Apache TinkerPop? Open Source Graph Computing Framework http://tinkerpop.incubator.apache.org/ Stephen Mallette - @spmallette © 2015. All Rights Reserved.

Upload: datastax-academy

Post on 09-Jan-2017

400 views

Category:

Technology


1 download

TRANSCRIPT

What’s New in Apache TinkerPop?Open Source Graph Computing Framework

http://tinkerpop.incubator.apache.org/

Stephen Mallette - @spmallette

© 2015. All Rights Reserved.

© 2015. All Rights Reserved.

By Andrea Mann from London, United Kingdom (Flickr Uploaded by Hohum) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

© 2015. All Rights Reserved.

© 2015. All Rights Reserved.

Georgius Agricola, De re metallica 1556

© 2015. All Rights Reserved.

“Woman at spinning wheel with man carding” Smithfield Decretals (British Library, Royal 10 E. IV, fol. 147v), c. 1340“Carding, Spinning and Weaving” by Giovanni Boccaccio from De claris mulieribus 15th Century

© 2015. All Rights Reserved.

London, British Library, Royal 18 E.iii (15th century) [Public domain], via Wikimedia Commons

© 2015. All Rights Reserved.

[Public domain], via Wikimedia Commons

© 2015. All Rights Reserved.

By Unknown. Photo credit: Yale University Art Gallery. In the Public Domain. [Public domain], via Wikimedia Commons

[Public domain], via Wikimedia Commons

© 2015. All Rights Reserved.

By Dogcow (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons

© 2015. All Rights Reserved.

By Adam Schuster (Flickr: Proto IBM) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

By Arnold Reinhold [CC BY-SA 2.5 (http://creativecommons.org/licenses/by-sa/2.5)], via Wikimedia Commons

© 2015. All Rights Reserved.

© 2015. All Rights Reserved.

label: personname: Stephen

label: booktitle: Connections

label: personname: James

label: bought label: wrote

Graph Data Structure

© 2015. All Rights Reserved.

TinkerPop 2.0

TinkerPop 3.0

The TinkerPop Stack

© 2015. All Rights Reserved.

The TinkerPop Stack

© 2015. All Rights Reserved.

Gremlin in TinkerPop3

is NOT “just ”

It is advised that not use expressionsƛ

supports BOTH imperative and declarative querying

© 2015. All Rights Reserved.

$ bin/gremlin.sh

\,,,/ (o o)-----oOOo-(3)-oOOo-----plugin activated: tinkerpop.serverplugin activated: tinkerpop.utilitiesplugin activated: tinkerpop.tinkergraphgremlin>

© 2015. All Rights Reserved.

$ bin/gremlin.sh

\,,,/ (o o)-----oOOo-(3)-oOOo-----plugin activated: tinkerpop.serverplugin activated: tinkerpop.utilitiesplugin activated: tinkerpop.tinkergraphgremlin> graph = GraphFactory.open("graph.properties")==>tinkergraph[vertices:0 edges:0]gremlin>

© 2015. All Rights Reserved.

$ bin/gremlin.sh

\,,,/ (o o)-----oOOo-(3)-oOOo-----plugin activated: tinkerpop.serverplugin activated: tinkerpop.utilitiesplugin activated: tinkerpop.tinkergraphgremlin> graph = GraphFactory.open("graph.properties")==>tinkergraph[vertices:0 edges:0]gremlin> graph.io(gryo()).readGraph('data.kryo')==>nullgremlin> graph==>tinkergraph[vertices:1933 edges:4125]gremlin>

discussion

wrote

hasResponse

person response

participatesIn hasRoot

© 2015. All Rights Reserved.

$ bin/gremlin.sh

\,,,/ (o o)-----oOOo-(3)-oOOo-----plugin activated: tinkerpop.serverplugin activated: tinkerpop.utilitiesplugin activated: tinkerpop.tinkergraphgremlin> graph = GraphFactory.open("graph.properties")==>tinkergraph[vertices:0 edges:0]gremlin> graph.io(gryo()).readGraph('data.kryo')==>nullgremlin> graph==>tinkergraph[vertices:1933 edges:4125]gremlin> g = graph.traversal()==>graphtraversalsource[tinkergraph[vertices:1933 edges:4125], standard]gremlin>

© 2015. All Rights Reserved.

gremlin> g.V(4608)==>v[4608]

4608

person

g.V(4608)

“Find the vertex with id 4608”

© 2015. All Rights Reserved.

gremlin> g.V(4608).values('userName')==>Renlit

4608

person

g.V(4608)

Renlit

userName

.values('userName')

“Get the value of the ‘userName’ property on vertex 4608”

© 2015. All Rights Reserved.

gremlin> g.V(4608).out('wrote')==>v[354560]==>v[640768]...==>v[466432]

4608 wrote

person response

g.V(4608) .out('wrote')

“Find the responses posted by ‘Renlit’”

© 2015. All Rights Reserved.

gremlin> g.V(4608).out('wrote').count()==>67

4608 wrote

person response

.out('wrote')

“Find the number of responses posted by ‘Renlit’”

g.V(4608) .count()

67

© 2015. All Rights Reserved.

gremlin> t = g.V(4608).out('wrote').count();null==>nullgremlin> t.strategies.toList()==>ConjunctionStrategy==>IncidentToAdjacentStrategy==>AdjacentToIncidentStrategy==>IdentityRemovalStrategy==>DedupBijectionStrategy==>MatchPredicateStrategy==>RangeByIsCountStrategy==>TinkerGraphStepStrategy==>ProfileStrategy==>EngineDependentStrategy==>ComputerVerificationStrategy==>StandardVerificationStrategy

© 2015. All Rights Reserved.

t.strategies.toList()

StrategyApplication

Original Query g.V(4608).out('wrote').count()

© 2015. All Rights Reserved.

AdjacentToIncidentStrategy

Post-Strategies g.V(4608).outE('wrote').count()

ConjunctionStrategyIncidentToAdjacentStrategy

IdentityRemovalStrategyDedupBijectionStrategyMatchPredicateStrategyRangeByIsCountStrategyTinkerGraphStepStrategyProfileStrategyEngineDependentStrategyComputerVerificationStrategyStandardVerificationStrategy

gremlin> g.V(4608).as('a').out('wrote').out('hasResponse').in('wrote') .where(neq('a')).groupCount().next()==>v[5376]=4==>v[2304]=2==>v[5888]=7...==>v[10496]=1

4608 wrote

person response

hasResponse

hasResponse

hasResponse

...

response

wrote

wrote

wrote

...

person person

4608

g.V(4608).

as('a').out('wrote') .out('hasResponse') .in('wrote') .where(neq('a')) .groupCount()

“Get a distribution over the authors who replied to ‘Renlit’”

© 2015. All Rights Reserved.

gremlin> g.V(4608).out('wrote').values('responseLevel').groupCount()==>[1:11, 2:19, 3:22, 4:9, 5:3, 6:3]gremlin>

4608 wrote

person response

g.V(4608) .out('wrote')

...

responseLevel

.values('responseLevel').groupCount()

“Get a distribution over the ‘responseLevel’ value for posts by ‘Renlit’”

© 2015. All Rights Reserved.

gremlin> g.V().has('type','response').values('responseLevel').groupCount()==>[1:358, 2:796, 3:445, 4:150, 5:57, 6:13, 7:4, 8:1]gremlin>

response

g.V() .has('type','response')

...

responseLevel

.values('responseLevel') .groupCount()

type response

“Get a distribution over the ‘responseLevel’ for all posts in the graph”

gremlin> g.V(4608).out('wrote').values('responseLevel').groupCount()==>[1:11, 2:19, 3:22, 4:9, 5:3, 6:3]gremlin> g.V().has('type','response').values('responseLevel').groupCount()==>[1:358, 2:796, 3:445, 4:150, 5:57, 6:13, 7:4, 8:1]gremlin>

g.V(4608).out('wrote') .values('responseLevel') .groupCount()

g.V().has('type','response') .values('responseLevel') .groupCount()

© 2015. All Rights Reserved.

gremlin> :install org.apache.tinkerpop hadoop-gremlin 3.0.0-incubating==>Loaded: [org.apache.tinkerpop, hadoop-gremlin, 3.0.0-incubating] - restart the console to use [tinkerpop.hadoop]gremlin> :exit

... $ bin/gremlin.sh \,,,/ (o o)-----oOOo-(3)-oOOo-----plugin activated: tinkerpop.serverplugin activated: tinkerpop.utilitiesplugin activated: tinkerpop.tinkergraphgremlin> :plugin use tinkerpop.hadoop==>tinkerpop.hadoop activatedgremlin> hdfs.copyFromLocal('data.kryo', 'data.kryo')==>nullgremlin> hdfs.ls()==>rw-r--r-- smallette supergroup 5782840 data.kryogremlin>

© 2015. All Rights Reserved.

gremlin> graph = GraphFactory.open('conf/hadoop/data-gryo.properties')==>hadoopgraph[gryoinputformat->gryooutputformat]gremlin> g = graph.traversal(computer(SparkGraphComputer))==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],sparkgraphcomputer]

© 2015. All Rights Reserved.

gremlin> graph = GraphFactory.open('conf/hadoop/data-gryo.properties')==>hadoopgraph[gryoinputformat->gryooutputformat]gremlin> g = graph.traversal(computer(SparkGraphComputer))==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],sparkgraphcomputer]gremlin> g.V(4608).out('wrote').values('responseLevel').groupCount()==>[1:11, 2:19, 3:22, 4:9, 5:3, 6:3]gremlin> g.V().has('type','response').values('responseLevel').groupCount()==>[1:358, 2:796, 3:445, 4:150, 5:57, 6:13, 7:4, 8:1]

© 2015. All Rights Reserved.

g.V(4608)

groupCount()

out().in() g.V().

Any Graph System

Neo4j

Titan

Sqlg

BlueM

ix

Hadoop

Giraph

Spark

OrientD

B

...

gremlin> :plugin use tinkerpop.gephi==>tinkerpop.gephi activatedgremlin> :remote connect tinkerpop.gephi==>Connection to Gephi - http://localhost:8080/workspace0 with stepDelay:1000, startRGBColor:[0.0, 1.0, 0.5], colorToFade:g, colorFadeRate:0.7, startSize:20.0,sizeDecrementRate:0.33

© 2015. All Rights Reserved.

gremlin> :plugin use tinkerpop.gephi==>tinkerpop.gephi activatedgremlin> :remote connect tinkerpop.gephi==>Connection to Gephi - http://localhost:8080/workspace0 with stepDelay:1000, startRGBColor:[0.0, 1.0, 0.5], colorToFade:g, colorFadeRate:0.7, startSize:20.0,sizeDecrementRate:0.33gremlin> :> graph==>tinkergraph[vertices:1933 edges:4125]

© 2015. All Rights Reserved.

gremlin> :> graph==>tinkergraph[vertices:1933 edges:4125]

© 2015. All Rights Reserved.

gremlin> g.V(10240).values('userName')==>Nayagremlin> g.V(5888).values('userName')==>Loret

© 2015. All Rights Reserved.

gremlin> subGraph = g.V(10240,5888).repeat(__.outE().subgraph('subGraph').inV()) .times(10) .cap('subGraph').next()==>tinkergraph[vertices:1152 edges:1343]gremlin> :> subGraph

© 2015. All Rights Reserved.

Naya

Loret

gremlin> :remote config visualTraversal subGraph svg==>Connection to Gephi - http://localhost:8080/workspace0 with stepDelay:1000, startRGBColor:[0.0, 1.0, 0.5], colorToFade:g, colorFadeRate:0.7, startSize:20.0,sizeDecrementRate:0.33gremlin> svg==>graphtraversalsource[tinkergraph[vertices:1152 edges:1343], standard]gremlin> svg.strategies.toList()==>ConjunctionStrategy==>IncidentToAdjacentStrategy==>AdjacentToIncidentStrategy==>IdentityRemovalStrategy==>FilterRankingStrategy==>MatchPredicateStrategy==>RangeByIsCountStrategy==>TinkerGraphStepStrategy==>EngineDependentStrategy==>GephiTraversalVisualizationStrategy==>ProfileStrategy==>ComputerVerificationStrategy

© 2015. All Rights Reserved.

gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount()==>[v[5888]:4]

© 2015. All Rights Reserved.

gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount()==>[v[5888]:4]

© 2015. All Rights Reserved.

gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount()==>[v[5888]:4]

© 2015. All Rights Reserved.

gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount()==>[v[5888]:4]

© 2015. All Rights Reserved.

gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount()==>[v[5888]:4]

© 2015. All Rights Reserved.

gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount()==>[v[5888]:4]

© 2015. All Rights Reserved.

Takeaways

If you have connected data, use a Graph DB

If you use a Graph DB, consider

If you use , get started with Gremlin Console

© 2015. All Rights Reserved.

Acknowledgements

Ketrina Yim@KetrinaYim

Artist behind Gremlin and his friends

Joe Leehttp://jml3designz.com/

Graphic designer providing support on this presentation

Apache TinkerPophttp://tinkerpop.incubator.apache.org/

The TinkerPop Community

© 2015. All Rights Reserved.