the path forward

131
MARKO A. RODRIGUEZ http://THINKAURELIUS.COM THE PATH FORWARD TITAN 1.0 + TINKERPOP3 http://TINKERPOP.COM

Upload: marko-rodriguez

Post on 15-Jul-2015

1.766 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: The Path Forward

MARKO A. RODRIGUEZ

http://THINKAURELIUS.COM

THE PATH FORWARDTITAN 1.0 + TINKERPOP3

http://TINKERPOP.COM

Page 2: The Path Forward

PART 1: INTRODUCTION TO GRAPHS

Page 3: The Path Forward

VERTEX

Page 4: The Path Forward

software

Page 5: The Path Forward

software

LABEL

Page 6: The Path Forward

software

name:gremlin

Page 7: The Path Forward

software

name:gremlin

PROPERTY

KEY VALUE

Page 8: The Path Forward

software

name:gremlinuse:queries

Page 9: The Path Forward

software

software

name:titanuse:database

name:gremlinuse:queries

Page 10: The Path Forward

software

software

name:titanuse:database

name:gremlinuse:queries

dependsO

n

Page 11: The Path Forward

software

software

name:titanuse:database

name:gremlinuse:queries

EDGEdependsO

n

Page 12: The Path Forward

software

software

name:titanuse:database

name:gremlinuse:queries

dependsO

nsin

ce:0.1

Page 13: The Path Forward

GRAPH

Page 14: The Path Forward

STRUCTURE

PROCESS

COMPUTING

Page 15: The Path Forward

GRAPH

TRAVERSAL

STRUCTURE

PROCESS

COMPUTING

Page 16: The Path Forward

GRAPH

TRAVERSAL

STRUCTURE

PROCESS

COMPUTING GRAPH-BASEDCOMPUTING

Page 17: The Path Forward

WhY GRAPH-BASED COMPUTING?

Page 18: The Path Forward

INTUITIVE MODELING

WhY GRAPH-BASED COMPUTING?

Page 19: The Path Forward

INTUITIVE MODELING

EXPRESSIVE QUERYING

WhY GRAPH-BASED COMPUTING?

Page 20: The Path Forward

INTUITIVE MODELING

EXPRESSIVE QUERYING

NUMEROUS ANALYSES

Centrality

Mixing Patterns

GeodesicsPath Expressions

RankingInferenceMotifs

Scoring

WhY GRAPH-BASED COMPUTING?

Page 21: The Path Forward

f( )→

ANALYSES ARE THE EPIPHENOMENA OF TRAVERSAL

Page 22: The Path Forward

WHAT IS THE SIGNIFICANCE OF GRAPH ANALYSIS?

Page 23: The Path Forward

ANALYSES YIELD INSIGHTS ABOUT THE MODEL

=

DATA

PRODUCTS

DATA-DRIVEN

DECISION SUPPORT

Page 24: The Path Forward

RECOMMENDATION

People you may know.

Products you might like.

Movies you should watch and the friends you should watch them with.

SOCIAL GRAPH

RATINGS GRAPH

SOCIAL+RATINGSGRAPH

Page 25: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

WHO ELSE MIGHT HERCULES KNOW?

Page 26: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

gremlin> hercules==>v[0]

Page 27: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

gremlin> hercules.out('knows')==>v[1]==>v[2]==>v[3]

Page 28: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

gremlin> hercules.out('knows').out('knows')==>v[4]==>v[5]==>v[5]==>v[6]==>v[5]

Page 29: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

gremlin> hercules.out('knows').out('knows').groupCount()==>v[4]=1==>v[5]=3==>v[6]=1

Page 30: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

knows

HERCULES PROBABLY KNOWS NEPTUNE

Page 31: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

...PROBABLY MORE SO WHEN OTHER TYPES OF EDGES ARE ANALYZED

HERCULES PROBABLY KNOWS NEPTUNE

Page 32: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

Page 33: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

likes

Page 34: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

likes

SOCIAL GRAPH

Page 35: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

7

human flesh

likes

SOCIAL GRAPH

Page 36: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

7

human flesh

likes

likes

likes

SOCIAL GRAPH

Page 37: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

7

human flesh

likes

likes

likes

8

tartarus

SOCIAL GRAPH

Page 38: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

7

human flesh

likes

likes

likes

8

likes likes

tartarus

dislikes

SOCIAL GRAPH

Page 39: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

7

human flesh

likes

likes

likes

8

likes likes

tartarus

dislikes

SOCIAL GRAPH

RATINGS GRAPH

Page 40: The Path Forward

0 2

hercules

1

3

5

4

6

cerberus

nemean

hydra

knows

knows

knows

pluto

neptune

jupiter

knows

knows

knows

knows

knows

father

brother

7

human flesh

likes

likes

likes

composedOf

8

likes likes

tartarus

dislikes

NEMEAN MIGHT LIKE TARTARUS

smellsOf

SOCIAL GRAPH

RATINGS GRAPH

PRODUCT GRAPH

* Collaborative Filtering + Content-Based Recommendation

Page 41: The Path Forward

PATH FINDING

How is this person related to this film?

Which authors of this book also wrote a New York Times bestseller?

Which movies are based on a book by a New York Times bestseller?

MOVIE GRAPH

BOOK GRAPH

MOVIE+BOOKGRAPH

Page 42: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

WHO PLAYED HERCULES IN WHAT MOVIE?

Page 43: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

gremlin> hercules.match('hercules',

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

WHO PLAYED HERCULES IN WHAT MOVIE?

Page 44: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'),

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

WHO PLAYED HERCULES IN WHAT MOVIE?

Page 45: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'), g.of().as('movie').out('hasActor').as('actor'),

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

WHO PLAYED HERCULES IN WHAT MOVIE?

Page 46: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'), g.of().as('movie').out('hasActor').as('actor'), g.of().as('actor').out('role').as('hercules'),

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

WHO PLAYED HERCULES IN WHAT MOVIE?

Page 47: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'), g.of().as('movie').out('hasActor').as('actor'), g.of().as('actor').out('role').as('hercules'), g.of().as('actor').out('actor').as('star'))

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

WHO PLAYED HERCULES IN WHAT MOVIE?

Page 48: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'), g.of().as('movie').out('hasActor').as('actor'), g.of().as('actor').out('role').as('hercules'), g.of().as('actor').out('actor').as('star')).select(['movie','star'])==>[movie:v[7], star:v[9]]

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

WHO PLAYED HERCULES IN WHAT MOVIE?

Page 49: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie') g.of().as('movie').out('hasActor').as('actor') g.of().as('actor').out('role').as('hercules') g.of().as('actor').out('actor').as('star')).select(['movie','star']){it.name}==>[movie:hercules in new york, star:arnold schwarzenegger]

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

WHO PLAYED HERCULES IN WHAT MOVIE?

Page 50: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

Page 51: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

Page 52: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

12 the arms ofhercules

depictedIn

Page 53: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

12 the arms ofhercules

fredsaberhagen

13

depictedIn

writtenBy

Page 54: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

12 the arms ofhercules

fredsaberhagen

13

depictedIn

writtenBy

14

albuquerquelivesIn

Page 55: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

12 the arms ofhercules

fredsaberhagen

13

depictedIn

writtenBy

14

albuquerquelivesIn

1525-North

santa fe

Page 56: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

12 the arms ofhercules

fredsaberhagen

13

depictedIn

writtenBy

14

albuquerquelivesIn

1525-North

santa fe

16

markorodriguez

livesIn

Page 57: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

12 the arms ofhercules

fredsaberhagen

13

depictedIn

writtenBy

14

albuquerquelivesIn

1525-North

santa fe

16

markorodriguez

livesIn

thinksHeIs

Page 58: The Path Forward

0

hercules

arnoldschwarzenegger

hasActor7

hercules in new york

depictedIn

10 8actor

role

9hasActor

6

role

jupiter

ernestgraves

actor11

depictedIn

12 the arms ofhercules

fredsaberhagen

13

depictedIn

writtenBy

14

albuquerquelivesIn

1525-North

santa fe

16

markorodriguez

livesIn

thinksHeIs

MOVIE GRAPH

BOOK GRAPH

TRANSPORTATION GRAPH

PROFILE GRAPH

Page 59: The Path Forward

SOCIAL INFLUENCE

Who are the most influential people in java, mathematics, art, surreal art, politics, ...?

Which region of the social graph will propagate this advertisement this furthest?

Which 3 experts should review this submitted article?

Which people should I talk to at the upcoming conference and what topics should I talk to them about?

SOCIAL + COMMUNICATION + EXPERTISE + EVENT GRAPH

Page 60: The Path Forward

PATTERN IDENTIFICATION

This connectivity pattern is a sign of financial fraud. When this motif is found, a red flag will be raised.

Healthy discourse is typified by a discussion board with a branch factor in this range and a concept clique score in this range.

TRANSACTION GRAPH

DISCUSSION GRAPH

Page 61: The Path Forward

KNOWLEDGE DISCOVERY

The terms "ice", "fans", "stanley cup," are classified as "sports"

Given that all identified birds fly, it can be deduced that all birds fly. If contrary evidence is provided, then this "fact" can be retracted.

WIKIPEDIA GRAPH

EVIDENTIAL LOGIC GRAPH

Page 62: The Path Forward

WORLD MODEL

Page 63: The Path Forward

WORLD MODEL

WORLD PROCESSES

Page 64: The Path Forward

WORLD MODEL

WORLD PROCESSES

A single world model and various types of traversers moving through that model to solve problems.

Page 65: The Path Forward

GRAPH

TRAVERSAL

STRUCTURE

PROCESS

COMPUTING GRAPH-BASEDCOMPUTING

Page 66: The Path Forward

PART 2: THE PATH TO TITAN 1.0

Page 67: The Path Forward

...need to represent and process graphs at the 100+ billion edge scale w/ thousands of concurrent transactions.

...desire a free, open source distributed graph database.

...need both local graph traversals (OLTP) and batch graph processing (OLAP).

WhY CREATE TITAN?

A number of Aurelius' clients...

Page 68: The Path Forward

..."infinite size" graphs and "unlimited" users by means of a distributed storage engine.

...distribution via the liberal, free, open source Apache2 license.

...real-time local traversals (OLTP) and support for global batch processing via Hadoop (OLAP).

TITAN's KEY FEATURES

Titan provides...

Page 69: The Path Forward

-OR-

BACKEND AGNOSTIC

Page 70: The Path Forward

titan$ bin/gremlin.sh

\,,,/ (o o)-----oOOo-(_)-oOOo-----gremlin> conf = new BaseConfiguration();==>org.apache.commons.configuration.BaseConfiguration@763861e6gremlin> conf.setProperty("storage.backend","cassandra");gremlin> conf.setProperty("storage.hostname","77.77.77.77");gremlin> g = TitanFactory.open(conf); ==>titangraph[cassandra:77.77.77.77]gremlin>

TITAN DISTRIBUTEDVIA CASSANDRA

Page 71: The Path Forward

INHERITED FEATURES

Continuously available with no single point of failure.

Cassandra available at http://cassandra.apache.org/

No write bottlenecks to the graph as there is no master/slave architecture.

Elastic scalability allows for the introduction and removal of machines.

Caching layer ensures that continuously accessed data is available in memory.

Built-in replication ensures data is available during machine failure.

Page 72: The Path Forward

titan$ bin/gremlin.sh

\,,,/ (o o)-----oOOo-(_)-oOOo-----gremlin> conf = new BaseConfiguration();==>org.apache.commons.configuration.BaseConfiguration@763861e6gremlin> conf.setProperty("storage.backend","hbase");gremlin> conf.setProperty("storage.hostname","77.77.77.77");gremlin> g = TitanFactory.open(conf); ==>titangraph[hbase:77.77.77.77]gremlin>

TITAN DISTRIBUTED

VIA HBASE

Page 73: The Path Forward

INHERITED FEATURES

Linear scalability with the addition of machines.

HBase available at http://hbase.apache.org/

Strictly consistent reads and writes.

HDFS-based data replication.

Base classes for backing Hadoop MapReduce jobs with HBase tables.

Generally good integration with the tools in the Hadoop ecosystem.

Page 74: The Path Forward

Titan is all about ...

Page 75: The Path Forward

Titan is all about numerous concurrent users...

Page 76: The Path Forward

Titan is all about numerous concurrent users...high availability....

Page 77: The Path Forward

Titan is all about numerous concurrent users...high availability....

dynamic scalability...

Page 78: The Path Forward

VERTEX-CENTRIC INDICES

Natural, real-world graphs contain vertices of high degree.

Even if rare, their degree ensures that they exist on many paths.

Traversing a high degree vertex means touching numerous incident edges and potentially touching most of the graph in only a few steps.

THE SUPER NODE PROBLEM

Page 79: The Path Forward

VERTEX-CENTRIC INDICES

A "super node" only exists from the vantage point of classic "textbook style" graphs.

In the world of property graphs, intelligent disk-level filtering can interpret a "super node" as a more manageable low-degree vertex.

Vertex-centric querying utilizes sort orders for speedy lookup of incident edges with particular qualities.

A SUPER NODE SOLUTION

Page 80: The Path Forward

VERTEX-CENTRIC INDICESPUSHDOWN PREDICATES

knowsknows

knows

likeslikes

likes likes

likes

stars:5

stars:3stars:3

stars:2 stars:2

vertex.bothE()

8 edges

Page 81: The Path Forward

VERTEX-CENTRIC INDICESPUSHDOWN PREDICATES

knowsknows

likeslikes

likes likes

likes

stars:5

stars:3stars:3

stars:2 stars:2

vertex.outE()

7 edges

Page 82: The Path Forward

VERTEX-CENTRIC INDICESPUSHDOWN PREDICATES

likeslikes

likes likes

likes

stars:5

stars:3stars:3

stars:2 stars:2

vertex.outE("likes")

5 edges

Page 83: The Path Forward

VERTEX-CENTRIC INDICESPUSHDOWN PREDICATES

likes

stars:5

1 edge

vertex.outE("likes").has("stars",5)

Page 84: The Path Forward

VERTEX-CENTRIC INDICES

likes

likes

likes

knows

knows

stars:1

stars:2

stars:5

DISK-LEVEL SORTING/INDEXING

Page 85: The Path Forward

VERTEX-CENTRIC INDICES

likes

likes

likes

knows

knows

likes

knows

stars:1

stars:2

stars:5

DISK-LEVEL SORTING/INDEXING

Page 86: The Path Forward

VERTEX-CENTRIC INDICES

likes

likes

likes

knows

knows

likes w/ time 1-3

knows

stars:1

stars:2

stars:5

likes w/ time 4-5

DISK-LEVEL SORTING/INDEXING

Page 87: The Path Forward

VERTEX-CENTRIC INDICES

likes

likes

likes

knows

knows

likes w/ time 1-3

knows

schema = g.openManagementSystem()stars = schema .makePropertyKey("stars") .dataType(Integer.class) .make()likes = schema .makeEdgeLabel('likes') .make()schema.buildEdgeIndex(likes, 'timeIdx',BOTH,DESC,stars)

stars:1

stars:2

stars:5

likes w/ time 4-5

DISK-LEVEL SORTING/INDEXING

Page 88: The Path Forward

THE TITAN OLAP STORY

0

1

2

3

AB

A

C

a:b

c:d

e:f

g:h

i:j

4

5

6

7

1 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3

id props

vertex

id props label id

edge

id props label id

edge

id label id

edge

id label id

edge

incoming edges outgoing edges

Page 89: The Path Forward

0

1

3

4

5

6

7

8

9

10

11

AN ADJACENCY LIST

Page 90: The Path Forward

127.0.0.2 127.0.0.3 127.0.0.4

AN ADJACENCY LIST+

CLUSTER0

1

3

4

5

6

7

8

9

10

11

Page 91: The Path Forward

0

1

2

3

4

5

6

7

8

9

10

11

A DISTRIBUTED ADJACENCY LIST

127.0.0.2 127.0.0.3 127.0.0.4

Page 92: The Path Forward

Hadoop is a distributed computing platform composed of two key components:

HDFS: A distributed file system that stores arbitrarily large files within a cluster.

MapReduce: A parallel functional computing model for key/value pair data.

HADOOP

http://hadoop.apache.org

Page 93: The Path Forward

0

1

2

3

4

5

6

7

8

9

10

11

Structure

Process

Faunus provides graph input/output formats (structure) and a traversal language for graphs (process).

FAUNUS AND HADOOP

127.0.0.2 127.0.0.3 127.0.0.4

Page 94: The Path Forward

THE TITAN OLAP STORY

0

1

3

4

5

6

7

8

9

10

11

Serial Key/Value Data Structure Indexed Key/Indexed Value Data Structure

0

1

3

4

5

6

7

8

9

10

11

Page 95: The Path Forward

Adam Jacobs. 2009. The Pathologies of Big Data. Communications of the ACM 52, 8 (August 2009), 36-44. doi:10.1145/1536616.1536632 http://doi.acm.org/10.1145/1536616.1536632

Page 96: The Path Forward

INTRA-CLUSTER CONFIGURATION

Faunus/HadoopTitan/Cassandra

Data is processed on the machine where it is located. Limited network communication.

Page 97: The Path Forward

INTER-CLUSTER CONFIGURATION

Graph data is offloaded to another cluster.Repeated analysis does not interfere with production graph database.

Page 98: The Path Forward

THE TITAN OLAP STORY

Titan currently provides a Hadoop OLAP engine Titan-Hadoop (nicknamed Faunus)

Titan 1.0 will also provide an in-memory, off head, single-machine OLAP engine. Titan-Memory (nicknamed Fulgora)

Page 99: The Path Forward

PART 3: THE PATH TO TINKERPOP3

Page 100: The Path Forward

TINKERPOP3 FEATURES

A standard, vendor-agnostic graph query language.

Both OLTP and OLAP engines for evaluating queries.

A way for developers to create custom, domain specific variants.

A server system for pushing queries to the server for evaluation.

A vertex-centric computing framework for distributed graph algorithms.

Page 101: The Path Forward

gremlin>

Page 102: The Path Forward

gremlin> gremlin = g.addVertex(label,'software','name','gremlin')==>v[0]

0

software

name:gremlin

Page 103: The Path Forward

gremlin> gremlin = g.addVertex(label,'software','name','gremlin')==>v[0]gremlin> titan = g.addVertex(label,'software','name','titan')==>v[2]

0

software

2

software

name:titan name:gremlin

Page 104: The Path Forward

gremlin> gremlin = g.addVertex(label,'software','name','gremlin')==>v[0]gremlin> titan = g.addVertex(label,'software','name','titan')==>v[2]gremlin> titan.addEdge('dependsOn',gremlin,'since','0.1')==>e[4][2-dependsOn->0]

0

software

2

software

name:titan name:gremlindependsOn

since:0.1

Page 105: The Path Forward

Gremlin is a style of graph traversal that is heavily influenced by functional programming.

gremlin> gremlin = g.addVertex(label,'software','name','gremlin')==>v[0]gremlin> titan = g.addVertex(label,'software','name','titan')==>v[2]gremlin> titan.addEdge('dependsOn',gremlin,'since','0.1')==>e[4][2-dependsOn->0]

//////////////////////////////////////////////////////////////////

gremlin> g.V==>v[0]==>v[2]gremlin> g.V.has('name','titan')==>v[2]gremlin> g.V.has('name','titan').outE==>e[4][2-dependsOn->0]gremlin> g.V.has('name','titan').outE.since==>0.1gremlin> g.V.has('name','titan').out.name==>gremlin

Page 106: The Path Forward

g.V.has('name','titan').out.name

What are the names of the vertices outgoing adjacent to Titan?

Page 107: The Path Forward

0

2

name=gremlin

name=titan

Traversals are defined using fluent, function composition.

For the graph 'g'...

g.V.has('name','titan').out.name

Page 108: The Path Forward

Objects flow through the traversal in a lazy evaluation manner.

0

2

name=gremlin

name=titan

…get all the vertices...

g.V.has('name','titan').out.name

Page 109: The Path Forward

2

name=titan

The general functions are map(), flatMap(), filter(), sideEffect(), and branch().

…filter all vertices that don't have the name 'titan'…

g.V.has('name','titan').out.name

Page 110: The Path Forward

0

name=gremlin

Any programming language can provide a Gremlin implementation.

…get all outgoing adjacent vertices…

g.V.has('name','titan').out.name

Page 111: The Path Forward

gremlin

Any graph system vendor can provide a binding to Gremlin.

…get the names of those vertices...

g.V.has('name','titan').out.name

Page 112: The Path Forward

A

S filter map flatMap

sideEffect branch

S S E SE E

E

E

E

E

S S S

S

S

...

? ??

THE GENERIC STEPS

has()retain()except()

interval()simplePath()

dedup()timeLimit()

...

back()fold()

valueMap()match()orderBy()shuffle()

...

out()outE()inV()both()values()unfold()

...

aggregate()store()

groupCount()groupBy()tree()count()

subgraph()...

jump()until()choose()union()...

Page 113: The Path Forward

TINKERPOP'S OLAP STORY

Execute parallel, distributed Gremlin traversals.

Execute vertex-centric message passing algorithms.

Page 114: The Path Forward

TINKERPOP'S OLAP STORY

Execute parallel, distributed Gremlin traversals.

Execute vertex-centric message passing algorithms.

trav

erse

rs a

re th

e m

essa

ges!

Page 115: The Path Forward

OLAP

1

1

1

12

22

3

12

3

OLTP

✓ Real-time queries.✓ Touching a subset of the graph.✓ Numerous concurrent queries.✓ Good for local graph mutations.

✓ Long running queries.✓ Touching the entire of the graph.✓ Single user.✓ Good for bulk loading/mutations.

Page 116: The Path Forward

1. Logically distribute the vertex program to all the vertices in the graph.2. The vertices execute the vertex program sending and receiving messages on each iteration.3. Any number of MapReduce jobs can follow to aggregate the computed data in the graph.

GraphComputer

g.compute()

VertexProgram

program

distributed to all workers

MapReduceMapReduce

MapReducemapReduce

worker 1

worker 2

worker 3

Page 117: The Path Forward

gremlin> result = g.compute().program(PageRankVertexProgram).submit()==>result[tinkergraph[vertices:6 edges:6],memory[size:0]]gremlin> result.memory.iteration==>30gremlin> result.graph==>tinkergraph[vertices:6 edges:6]gremlin> result.graph.V.map{ [it.id,it.pageRank]}==>[1, 0.15000000000000002]==>[2, 0.19250000000000003]==>[3, 0.4018125]==>[4, 0.19250000000000003]==>[5, 0.23181250000000003]==>[6, 0.15000000000000002]

* Minor tweaks to example to make it fit nicely on a single line.

Page 118: The Path Forward

gremlin> g.V.both.both.name.groupCount()==>[ripple:3, peter:3, vadas:3, josh:7, lop:7, marko:7]

gremlin> g.V.both.both.name.groupCount().submit(g.compute())==>[ripple:3, peter:3, vadas:3, josh:7, lop:7, marko:7]

OLTP

OLAP

Page 119: The Path Forward

MULTIPLE GRAPH COMPUTERSFOR THE SAME GRAPH

gremlin> g = TitanFactory.open(conf)==>titangraph[66.77.88.99]gremlin> g.compute(FaunusGraphComputer) .program(PageRankVertexProgram) .submit()==>result[faunusgraph,memory[size:0]]

gremlin> g = TitanFactory.open(conf)==>titangraph[66.77.88.99]gremlin> g.compute(FulgoraGraphComputer) .program(PageRankVertexProgram) .submit()==>result[fulgoragraph,memory[size:0]]

Page 120: The Path Forward

GREMLIN SERVER

Page 121: The Path Forward

GREMLIN SERVERgremlin> g = TitanFactory.open(conf)==>titangraph[66.77.88.99]

localhost

Page 122: The Path Forward

GREMLIN SERVERgremlin> g = TitanFactory.open(conf)==>titangraph[66.77.88.99]gremlin> g.V.has('name','titan')==>v[2]

localhost

Page 123: The Path Forward

GREMLIN SERVERgremlin> g = TitanFactory.open(conf)==>titangraph[66.77.88.99]gremlin> g.V.has('name','titan')==>v[2]gremlin> g.V.has('name','titan').out

localhost

1. get the titan vertex

2. get titan's adjacents

Page 124: The Path Forward

GREMLIN SERVERgremlin> g = TitanFactory.open(conf)==>titangraph[66.77.88.99]gremlin> g.V.has('name','titan')==>v[2]gremlin> g.V.has('name','titan').out

localhost

1. get the titan vertex

2. get titan's adjacents

CHATTYEach step in the traversal is a round trip

Page 125: The Path Forward

GREMLIN SERVER

Page 126: The Path Forward

GREMLIN SERVER

Page 127: The Path Forward

GREMLIN SERVERgremlin> :remote connect tinkerpop.server titan.yaml==>connected - 66.77.88.99:8182,11.22.33.44:8182,...

localhost

Page 128: The Path Forward

GREMLIN SERVERgremlin> :remote connect tinkerpop.server titan.yaml==>connected - 66.77.88.99:8182,11.22.33.44:8182,...gremlin> :> g.V.has('name','titan')==>v[2]

localhost

g.V.has('name','titan')

Page 129: The Path Forward

GREMLIN SERVERgremlin> :remote connect tinkerpop.server titan.yaml==>connected - 66.77.88.99:8182,11.22.33.44:8182,...gremlin> :> g.V.has('name','titan')==>v[2]gremlin> :> g.V.has('name','titan').out

localhost

g.V.has('name','titan').out

LESS CHATTY

2. Push the traversal to the server and get a WebSockets iterator back

1. Gremlin Server client knows the hash function to query the machine with the Titan vertex

Page 130: The Path Forward

GREMLIN SERVER

Allows non-JVM languages to interact with a remote graph.

Supports stored procedures -- MyAlgorithms.class

Supports WebSockets, REST, and various serialization mechanisms.

Page 131: The Path Forward

CREDITSPRESENTED BY

MARKO A. RODRIGUEZ

MANY THANKS TOMATTHIAS BRöCHELER

STEPHEN MALLETTEDAN LAROCQUEDANIEL KUPPITZ

BOB BRIODYBRYN COOKE

JOSH SHINAVIERPAVEL YASKEVICH

AURELIUS COMMUNITYTINKERPOP COMMUNITY

KETRINA YIM