gremlin's graph traversal machinery
TRANSCRIPT
Gremlin’s Graph Traversal Machinery
Dr. Marko A. RodriguezDirector of Engineering at DataStax, Inc.
Project Management Committee, Apache TinkerPop
http://tinkerpop.apache.org
f(x1) = x2
The function f maps the object x1 (from the set of X) to the object x2 (from the set of X).x1 2 X x2 2 X
90°
A traverser wraps a value of type V.
class Traverser<V> { V value;}
class Traverser<V> { V value;}
90°
The step maps an integer traverser to an integer traverser.
class Traverser<V> { V value;}
class Traverser<V> { V value;}
Traverser<Integer> Traverser<Integer>
90°
A traverser of with a rotation of 0° becomes a traverser with a rotation of 90°.
Traverser(0) Traverser(90)
class Traverser<V> { V value;}
class Traverser<V> { V value;}
Both the input and output traversers are of type Traverser<Integer>.
90°
T [N] ! T [N]
2 T [N]2 T [N]
A traverser can have a bulk which denotes how many V values it represents.
90°
class Traverser<V> { V value; long bulk;}
class Traverser<V> { V value; long bulk;}
44
Bulking groups identical traversers to reduce the number of evaluations of a step.
90°
class Traverser<V> { V value; long bulk;}
class Traverser<V> { V value; long bulk;}
1
21 1
12
Bulking can reduce the size of the stream.
90°
class Traverser<V> { V value; long bulk;}
class Traverser<V> { V value; long bulk;}
…then reordering can increase the likelihood of bulking.
90°1
21
total bulk = 4total count = 4
total bulk = 4total count = 3
f : X ! Y
Different functions can yield different mappings between different domains and ranges.
h : Y ! Z
The output of f can be the input to h because the range of f is the domain of h (i.e. Y).
Y y = f(x)
Z z = h(y)
90° 225°
315°
Different types of steps exist and their various compositions yield query expressivity.
90° 225°
315°
Steps process traverser streams/flows and their composition is a traversal/iterator.
= . (). ().next()90° 225°
repeat( ).times(2)
180°
Anonymous traversals can serve as step arguments.
90°
traversal with a single step
90°
repeat( ).times(2)
180°
Some functions serve as step-modulators instead of steps.
90°
…groupCount().by(out().count())
…select(“a”,”b”).by(“name”)
…addE(“knows
”).from(“a”)
.to(“b”)
…order().by(“age”,decr)
repeat( ).times(2)
≣
90° 90°
180°
During optimization, traversal strategies may rewrite a traversal to a more efficient,semantically equivalent form.
90°
RepeatUnrollStrategy
interface TraversalStrategy { void apply(Traversal traversal); Set<TraversalStrategy> applyPrior(); Set<TraversalStrategy> applyPost();}
repeat( ).until(0°)
4 loops2 loops1 loop
90° 180° 360°
90°do while(x != )
repeat().until() provides do/while-semantics.
90°
≣
until(0°).repeat( )
0 loops2 loops1 loop
90° 180° 0°
90°dowhile(x != )
90°
≣
until().repeat() provides while/do-semantics.
until(0°).repeat( )3
Even if the traversers in the input stream can not be bulked, the output traversers may be able to be.
90°
filter(x)map(x) sideEffect(x)flatMap(x)one-to-one one-to-many one-to-(one-or-none) one-to-same
out(“knows”)
has(“name”,”gremlin”)
groupCount(“m”)
where(“a”,eq(“b”))
select(“a”,”b”)
path()
mean()
sum()
count()
groupCount()man
y-to
-one
(red
ucer
s)
order()
values(“age”)
values(“name”)and(x,y)
or(x,y)
coin(0.5)
sample(10)
simplePath()
dedup()
store(“m”)
tree(“m”)
subgraph(“m”)
in(“created”)
group(“m”)
m
label()
id()
* A collection of examples. Not the full step library.
match(x,y,z)
properties()
outE(“knows”)
V()
values(“age”)
filter(x)map(x) sideEffect(x)flatMap(x)one-to-one one-to-many one-to-(one-or-none) one-to-same
out(“knows”)
has(“name”,”gremlin”)
path()
groupCount()
simplePath()
m
label()
outE(“knows”)
group(“m”)
V()
T [V [ E] ! T [N+]values(“age”)
filter(x)map(x) sideEffect(x)flatMap(x)one-to-one one-to-many one-to-(one-or-none) one-to-same
out(“knows”)
has(“name”,”gremlin”)
path()
groupCount()
simplePath()
m
label()
outE(“knows”)
T [?] ! T [path]
T [V [ E] ! T [string]T [?] ! T [?]
T [V [ E] ! ; [ T [V [ E]
T [?] ! ; [ T [?]
T [V ] ! T [V ]⇤
T [V ] ! T [E]⇤group(“m”)
V()T [?] ! T [map[?,N+]]T [G] ! T [V ]⇤
T [V [ E] ! T [N+]values(“age”)
out(“knows”)
has(“name”,”gremlin”)
groupCount()
T [V [ E] ! ; [ T [V [ E]
T [V ] ! T [V ]⇤
V()T [?] ! T [map[?,N+]]T [G] ! T [V ]⇤
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
T [V [ E] ! T [N+]values(“age”)
out(“knows”)
has(“name”,”gremlin”)
groupCount()
T [V ] ! T [V ]⇤
V()
T [?] ! T [map[?,N+]]
T [G] ! T [V ]⇤
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
Steps can be composed if their respective domains and ranges match.
T [V [ E] ! ; [ T [V [ E]
values(“age”)
out(“knows”)
has(“name”,”gremlin”)
groupCount()
T [V ] ! T [V ]⇤
V() T [G] ! T [V ]⇤
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
T [V ] ! ; [ T [V ]
T [V ] ! T [N+]
T [N+] ! T [map[N+,N+]]
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
What is the distribution of ages of the people that Gremlin knows?
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
one graph to many vertices (flatMap)
…
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
one graph to many vertices (flatMap) one vertex
to that vertex or no vertex(filter)
?…
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
one graph to many vertices (flatMap) one vertex
to that vertex or no vertex(filter)
one vertexto many friend vertices
(flatMap)
?…
name=gremlin
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
one graph to many vertices (flatMap) one vertex
to that vertex or no vertex(filter)
one vertexto many friend vertices
(flatMap)
one vertex to one age value
(map)
?…
37
name=gremlin
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
one graph to many vertices (flatMap) one vertex
to that vertex or no vertex(filter)
one vertexto many friend vertices
(flatMap)
one vertex to one age value
(map)many age values
to an age distribution(map — reducer)
?…
37 [37:2, 41:1, 24:1, 35:4]37
37
2435
3535
35 41
name=gremlin
a b c
a b c
Traversal creation via step composition
Step parameterization via traversal and constant nesting
a().b().c()
a(b().c()).d(x)d(x)
functi
on compositi
on
functi
on nest
ing
fluen
t meth
ods
method ar
gumen
ts
Any language that supports function composition and function nesting can host Gremlin.
Gremlin Traversal Language
class Traverser<V> { V value; long bulk;}
class Step<S,E> { Traverser<E> processNextStart();}
f(x)
class Traversal<S,E> implements Iterator<E> { E next(); Traverser<E> nextTraverser();}
The fundamental constructs of Gremlin’s machinery.
Gremlin Traversal Machine
interface TraversalStrategy { void apply(Traversal traversal); Set<TraversalStrategy> applyPrior(); Set<TraversalStrategy> applyPost();}
a db c
a de
≣
1
Bytecode
Gremlin-Javag.V(1). repeat(out(“knows”)).times(2). groupCount().by(“age”)
[[V, 1][repeat, [[out, knows]]][times, 2][groupCount][by, age]]
Traversal GraphStep GroupCountStep
RepeatStep
VertexStep
GraphStep GroupCountStepVertexStep VertexStep
TraversalStrategies
GraphStep GroupCountStepVertexStep VertexStep
translates
compiles
optimizes
executes
[29:2, 30:1, 31:1, 35:10]
1
Bytecode
Gremlin-Pythong.V(1). repeat(out(‘knows’)).times(2). groupCount().by(‘age’)
[[V, 1][repeat, [[out, knows]]][times, 2][groupCount][by, age]]
Traversal GraphStep GroupCountStep
RepeatStep
VertexStep
GraphStep GroupCountStepVertexStep VertexStep
TraversalStrategies
GraphStep GroupCountStepVertexStep VertexStep
translates
compiles
optimizes
executes
[29:2, 30:1, 31:1, 35:10]
1
Bytecode
Gremlin-Pythong.V(1). repeat(out(‘knows’)).times(2). groupCount().by(‘age’)
[[V, 1][repeat, [[out, knows]]][times, 2][groupCount][by, age]]
Traversal GraphStep GroupCountStep
RepeatStep
VertexStep
GraphStep GroupCountStepVertexStep VertexStep
TraversalStrategies
GraphStep GroupCountStepVertexStep VertexStep
translates
compiles
optimizes
executes
[29:2, 30:1, 31:1, 35:10]
Gremlin language variant
Language agnostic bytecode
Execution engine assembly
Execution engine optimization
Execution engine evaluation
Lang
uage
Mac
hine
Lang
uage
Mac
hine
Gremlin-Python
Gremlin-Groovy
Gremlin-Java
Gremlin-JavaScriptGremlin-Ruby
Gremlin-Scala
Gremlin-ClojureBytecode
Java-based Implementation?
?-based Implementation
? ?
Python 2.7.2 (default, Oct 11 2012, 20:14:37)[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> from gremlin_python.structure.graph import Graph>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
Gremlin-Python
CPython
Python 2.7.2 (default, Oct 11 2012, 20:14:37)[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> from gremlin_python.structure.graph import Graph>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection>>> graph = Graph()>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g'))
Gremlin-Python
DriverRemoteConnection
CPython
Python 2.7.2 (default, Oct 11 2012, 20:14:37)[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> from gremlin_python.structure.graph import Graph>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection>>> graph = Graph()>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g'))# nested traversal with Python slicing and attribute interception extensions
>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()[u'marko', u'marko']
Gremlin-Python
Bytecode
DriverRemoteConnection
Gremlin Traversal MachineCPython JVM
Python 2.7.2 (default, Oct 11 2012, 20:14:37)[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> from gremlin_python.structure.graph import Graph>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection>>> graph = Graph()>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g'))# nested traversal with Python slicing and attribute interception extensions
>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()[u'marko', u'marko']# a complex, nested multi-line traversal
>>> g.V().match( \... as_(“a”).out("created").as_(“b”), \... as_(“b”).in_(“created").as_(“c”), \... as_(“a”).out("knows").as_(“c”)). \... select("c"). \... union(in_(“knows"),out("created")). \... name.toList()[u'ripple', u'marko', u'lop']>>>
Gremlin-Python
Bytecode
DriverRemoteConnection
Gremlin Traversal MachineCPython JVM
Cypher
Bytecode
Distinct query languages (not only Gremlin language variants) can generate bytecode for evaluation by any OLTP/OLAP TinkerPop-enabled graph system.
Gremlin Traversal Machine
SELECT DISTINCT ?cWHERE { ?a v:label "person" . ?a e:created ?b . ?b v:name ?c . ?a v:age ?d . FILTER (?d > 30)}
[ [V], [match, [[as, a], [hasLabel, person]], [[as, a], [out, created], [as, b]], [[as, a], [has, age, gt(30)]]], [select, b], [by, name], [dedup]]
Byteco
de
GraphDatabase
OLTP
GraphProcessor
OLAP
Bytecode Byteco
de
BytecodeTinkerGraphDSE Graph
TitanNeo4j
OrientDBIBM Graph
…
TinkerComputerSparkGiraph
HadoopFulgora
…
Gremlin Traversal Machine
Traversal Trave
rsal
Traversal
Gremlin traversals can be executed against OLTP graph databases and OLAP graph processors.That means that if, e.g., SPARQL compiles to bytecode, it can execute both OLTP and OLAP.
GraphDatabase
OLTP
GraphProcessor
OLAP
Gremlin Traversal Machine
Traversal Trave
rsal
TraversalTraversalStrategies TraversalStrategies
optimizes
Graph system providers (OLTP/OLAP) typically have custom strategies registered with the Gremlin traversal machine that take advantage of unique, provider-specific features.
Most OLTP graph systems have a traversal strategy that combines [V,has*]-sequences into a single global index-based flatMap-step.
g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
one graph to many vertices using index lookup (flatMap)
GraphStepStrategy
one graph to many vertices (flatMap)
one vertex to that vertex or no vertex(filter)
?…
compiles
optimizes
name=gremlin
DataStax Enterprise Graph
Most OLAP graph systems have a traversal strategy that bypasses Traversal semanticsand implements reducers using the native API of the system.
g.V().count()
one graph to long (map — reducer)
rdd.count() 12,146,934
compiles
one graph to many vertices (flatMap)
many vertices to long(map — reducer)
… 12,146,934
optimizes
SparkInterceptorStrategy
…
Physical Machine
DataProgram Traversal
Heap/DiskMemory MemoryMemory/Graph System
Physical Machine Java Virtual Machine
byte
code
step
s
DataProgram
Memory/DiskMemory
Physical Machine
inst
ruct
ions
JavaVirtual Machine
GremlinTraversal Machine
From the physical computing machine to the Gremlin traversal machine.
Stakeholders
Language ProvidersGremlin Language VariantDistinct Query Language
Application Developers Graph System Providers
OLAP ProviderOLTP Provider
Stakeholders
Application Developers
One query language for all OLTP/OLAP systems. Gremlin
G = (V,E)
Real-time and analytic queries are represented in Gremlin.
GraphDatabase
OLTP
GraphProcessor
OLAP
Stakeholders
Application Developers
One query language for all OLTP/OLAP systems.
No vendor lock-in.
Apache TinkerPop as the JDBC for graphs.
DataStax Enterprise Graph
Stakeholders
Application Developers
One query language for all OLTP/OLAP systems.
No vendor lock-in.
Gremlin is embedded in the developer’s language.
Iterator<String> result = g.V().hasLabel(“person”). order().by(“age”). limit(10).values(“name”)
vs.
ResultSet result = statement.executeQuery( “SELECT name FROM People \n” + “ ORDER BY age \n” + “ LIMIT 10”)
Gremlin
-Java
SQL i
n Jav
a
No “fat strings.” The developer writes their graph database/processor queries in their native programming language.
Stakeholders
Language ProvidersGremlin Language VariantDistinct Query Language
Easy to generate bytecode.
GraphTraversal.getMethods() .findAll { GraphTraversal.class == it.returnType } .collect { it.name } .unique() .each { pythonClass.append(""" def ${it}(self, *args): self.bytecode.add_step(“${it}”, *args) return self“””)}
Gremlin-Python’s source code is programmatically generated using Java reflection.
Stakeholders
Language ProvidersGremlin Language VariantDistinct Query Language
Easy to generate bytecode.
Bytecode executes against TinkerPop-enabled systems.
Language providers write a translator for the Gremlin traversal machine,not a particular graph database/processor.
DataStax Enterprise Graph
GraphDatabase
OLTP
GraphProcessor
OLAP
Stakeholders
Language ProvidersGremlin Language VariantDistinct Query Language
Easy to generate bytecode.
Bytecode executes against TinkerPop-enabled systems.
Provider can focus on design, not evaluation.
Gremlin Traversal Machine
The language designer does not have to concern themselves with OLTP or OLAP execution. They simply generate bytecode and the
Gremlin traversal machine handles the rest.
Easy to implement core interfaces.
Graph System Providers
Stakeholders
OLAP ProviderOLTP Provider
Vertex
Edge
Graph
Transaction
TraversalStrategy
Propertykey=value
?? ?
Provider supports all provided languages.
Easy to implement core interfaces.
Graph System Providers
Stakeholders
OLAP ProviderOLTP Provider
The provider automatically supports all query languages that have compilers that generate Gremlin bytecode.
OLTP providers can leverage existing OLAP systems.
Provider supports all provided languages.
Easy to implement core interfaces.
Graph System Providers
Stakeholders
OLAP ProviderOLTP Provider
DSE Graph leverages SparkGraphComputer for OLAP processing.
DataStax Enterprise Graph
Stakeholders
Language ProvidersGremlin Language VariantDistinct Query Language
Application Developers Graph System Providers
OLAP ProviderOLTP Provider
One query language for all OLTP/OLAP systems.
No vendor lock-in.
Gremlin is embedded in the developer’s language.
Easy to generate bytecode.
Bytecode executes against TinkerPop-enabled systems.
Provider can focus on design, not evaluation.
Easy to implement core interfaces.
Provider supports all provided languages.
OLTP providers can leverage existing OLAP systems.
Thank you.
http://tinkerpop.apache.orghttp://www.datastax.com/products/datastax-enterprise-graph