give sense to your big data w/ apache tinkerpop™ & property graph databases
TRANSCRIPT
Give sense to your Big Data with Apache
TinkerPop™ and property-graph databases
DuyHai DOAN
Apache Cassandra™ evangelist
@doanduyhai
Who Am I ?
2
• Technical advocate for Apache Cassandra™ at Datastax
• Committer for Apache Zeppelin™ and maintainer of Zeppelin/Cassandra
interpreter
• @doanduyhai
@doanduyhai
Who is Datastax
3
• Company offering Datastax Enterprise, a commercial distribution of Apache
Cassandra™
• Datastax Enterprise == Apache Cassandra™ ++ features
Why graph databases ?
@doanduyhai
As of 2017
5
@doanduyhai
Who is not using any of those apps ?
6
@doanduyhai
Needle in a haystack
@doanduyhai
Finding patterns
@doanduyhai
Root-cause analysis
Impact propagation
@doanduyhai
Everything is connected
@doanduyhai
Graph databases are trending
11
Graph vs
Relational
@doanduyhai
Relational databases
13
UserspersonId firstname lastname …
MoviesmovieId title country …
ViewpersonId movieId view_time …
@doanduyhai
Relational databases
14
• Define the relationships between entities
• Store the entities and relationships in a normalized fashion (normal forms)
@doanduyhai
Graph databases
15
User Movieview
@doanduyhai
Graph databases
16
• Define the relationships between entities
• Store the entities and relationships
• Allow end-users to explore the relationships
• Allow end-users to discover unexpected relations between entities
The value of data is
proportional to the number of
meaningful relationships
@doanduyhai
When to use graph databases ?
18
Apache TinkerPop™ introduction
What is TinkerPop ?
@doanduyhai
Apache TinkerPop™
21
• Open-source graph computing framework
• Started in 2009 by Marko A. Rodriguez, Josh Shinavier, and Peter Neubauer
• Join ASF since January 2015
• Currently version 3.2.4Frame
Furnac
e
Pipe
BluePrint
RexsterGremlin
@doanduyhai
TinkerPop stack
22
Real-time Batch
@doanduyhai
Graph databases family
23
• RDF (Resource Description Framework)
• AllegroGraph, BlazeGraph, OntoText, OpenLink Virtuoso …
• Property-graph
• Neo4J, Titan, Datastax Enterprise (DSE) Graph, OrientDB …
Property Graph
@doanduyhai
A graph is
25
• A set of vertices (nodes) and edges (arcs)
• Formal definition: G = (V, E)
User Movie
Vertices
Edge
@doanduyhai
A property-graph is
26
• A directed
User
@doanduyhai
A property-graph is
27
• A directed, binary,
User Movie
@doanduyhai
A property-graph is
28
• A directed, binary, attributed multi-graph
User Movie
name: DuyHai
age: 35
title: The Jedi Return
categories: [SF, action,
space]
view
view_time: xxx
knows
@doanduyhai
Some definitions
29
User Movie
Vertex Properties
name: DuyHai
age: 35
title: The Jedi Return
categories: [SF, action,
space]
Vertex Properties
@doanduyhai
Some definitions
30
User Movie
Edge
Edge
EdgeLabe
l
EdgeLabel
view
knows
@doanduyhai
Some definitions
31
User Movie
Edge
Edge
Properties
view
knows
view_time: xxx
Gremlin graph traversal
@doanduyhai
Graph vs Hardware allegory
33
Graph Traversal
@doanduyhai
Example of graph traversal
35
User
friendWith
Movie
like
Person
actor director
Give me all movies in which "Harrison Ford"
has played as actor
name
gender
title
year
id
name
rating
@doanduyhai
Example of graph traversal
36
g.V()
User
friendWith
Movie
like
Person
actor director
Give me all movies in which "Harrison Ford »
has played as actor
name
gender
title
year
id
name
rating
@doanduyhai
Example of graph traversal
37
g.V().hasLabel("Person")
User
friendWith
Movie
like
Person
actor director
Give me all movies in which "Harrison Ford »
has played as actor
name
gender
title
year
id
name
rating
@doanduyhai
Example of graph traversal
38
User
friendWith
Movie
like
Person
actor director
Give me all movies in which "Harrison Ford"
has played as actor
name
gender
title
year
id
name
rating
g.V().hasLabel("Person")
.has("name","Harrison Ford")
@doanduyhai
Example of graph traversal
39
User
friendWith
Movie
like
Person
actor director
Give me all movies in which "Harrison Ford"
has played as actor
name
gender
title
year
id
name
rating
g.V().hasLabel("Person")
.has("name","Harrison Ford")
.out("actor")
@doanduyhai
Example of graph traversal
40
User
friendWith
Movie
like
Person
actor director
Give me all movies in which "Harrison Ford"
has played as actor with mean rating > 7
name
gender
title
year
id
name
rating
g.V().hasLabel("Person")
.has("name","Harrison Ford")
.out("actor")
.where(inE("like").values("rating").mean().is(
gt(7)))
More Examples
Demo
42
43
Q & A