In a League of their Own: Neo4j and Premiership Football
Mark Needham@markhneedham
Outline
• Intro to graphs• When do we need a graph?• Property graph model• Neo4j’s query language• The football graph• Using Neo4j from .NET
Let’s talk graphs
You mean these?
Eating Brains
Dancing WithMichael Jackson
Nope!
Eating Brains
Dancing WithMichael JacksonThese are Charts!
NOT Graphs!
Ok so what’s a graph then?
Node
Relationship
The tube
The social network (graph)
Complexity
What are graphs good for?
complexity = f(size, semi-structure, connectedness)
Data Complexity
Size
complexity = f(size, semi-structure,
connectedness)
The Real Complexity
Semi-Structure
Email: [email protected]: [email protected]: @markhneedhamSkype: mk_jnr1984
USER
CONTACT
CONTACT_TYPE
FIRST_NAME LAST_NAMEUSER_ID EMAIL_1 EMAIL_2 TWITTERFACEBOOK SKYPE
Mark Needham315 [email protected]
[email protected] @markhneedhamNULL mk_jnr1984
Semi-Structure
complexity = f(size, semi-structure,
connectedness)
The Real Complexity
Connectedness
Connectedness
Connectedness
When do we need a graph?
Densely Connected
Semi Structured
Densely connected?
Lots of join tables
Semi-Structured?
Lots of sparse tables
Properties of graph databases
• Millions of ‘joins’ per second• Consistent query times as dataset
grows• Join Complexity and
Performance• Easy to evolve data model• Easy to ‘layer’ different types
of data together
Property Graph Data Model
Nodes
Nodes can have properties
• Used to represent entity attributes and/or metadata (e.g. timestamps, version)
• Key-value pairs• Java primitives• Arrays• null is not a valid value
• Every node can have different properties
What’s a node?
Relationships
Relationships
• Relationships are first class citizens • Every relationship has a name and a direction– Add structure to the graph– Provide semantic context for nodes
• Properties used to represent quality or weight of relationship, or metadata
• Every relationship must have a start node and end node
Relationships
Nodes can have more than one relationship
Self relationships are allowed
Nodes can be connected by more than one relationship
Labels
Think Gmail labels
• Nodes– Entities
• Relationships– Connect entities and structure domain
• Properties– Entity attributes, relationship qualities, and
metadata• Labels– Group nodes by role
Four Building Blocks
Purposeful abstraction of a domain designed to satisfy particular application/end-user goals
Models
ModelQuery
Design for Queryability
ModelModel
Design for Queryability
ModelQuery
Design for Queryability
Introducing Cypher
• Declarative Pattern-Matching language• SQL-like syntax• Designed for graphs
Patterns, patterns, everywhere
A
B C
(a) --> (b)
a b
It’s all about the ASCII art!
a b
The most basic query
MATCH (a)-->(b)RETURN a, b
(a)–[:ACTED_IN]->(m)
a m
Adding in a relationship type
ACTED IN
a m
Adding in a relationship type
MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, m.name
ACTED IN
The football graph
The football graph
Find Arsenal’s away matches
Find Arsenal’s away matches
Find Arsenal’s away matches
MATCH (team:Team)<-[:away_team]-(game)
WHERE team.name = "Arsenal"
RETURN game
Graph Pattern
MATCH (team:Team)<-[:away_team]-(game)
WHERE team.name = "Arsenal"
RETURN game.name
Anchor pattern in graph
MATCH (team:Team)<-[:away_team]-(game)
WHERE team.name = "Arsenal"
RETURN game.name
Create projection of results
MATCH (team:Team)<-[:away_team]-(game)
WHERE team.name = "Arsenal"
RETURN game.name
Find Arsenal’s away matches
Evolving the football graph
Find the top away goal scorers
Find the top away goal scorers
MATCH (team)<-[:away_team]-(game:Game),
(game)<-[:contains_match]-(season:Season),
(team)<-[:for]-(stats)<-[:played]-(player),
(stats)-[:in]->(game)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
Multiple graph patterns
MATCH (team)<-[:away_team]-(game:Game),
(game)<-[:contains_match]-(season:Season),
(team)<-[:for]-(stats)<-[:played]-(player),
(stats)-[:in]->(game)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
Anchor pattern in the graph
MATCH (team)<-[:away_team]-(game:Game),
(game)<-[:contains_match]-(season:Season),
(team)<-[:for]-(stats)<-[:played]-(player),
(stats)-[:in]->(game)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
Group by player
MATCH (team)<-[:away_team]-(game:Game),
(game)<-[:contains_match]-(season:Season),
(team)<-[:for]-(stats)<-[:played]-(player),
(stats)-[:in]->(game)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
Find the top away goal scorers
Other football queries
• Goals scored in each month by Michu• Tottenham results when Gareth Bale
scores• What did Wayne Rooney do in April?• Which players only score when a
game is televised?
Graph Query Design
The relational version
Graph vs Relational
Relational GraphsTables- assume records all have the same structure
Nodes- no need to set a property if it doesn’t exist
Foreign keys between tables- joins calculated at run time- the more tables you join to a query the slower the query gets
Relationships- stored as a ‘Pre-computed index’ at write time- very easy to do lots of ‘hops’ between relationships
.NET and Neo4j
REST Client
Application
HTTP
Neo4j Server
Neo4jClient
.NET and Neo4j
Application
HTTP
Neo4j Server
REST Client
.NET and Neo4j
.NET and Neo4j
.NET and Neo4j
.NET and Neo4j
.NET and Neo4j
Thinking in graphs
Graphs should be fun!
Ask for help if you get stuck
Last Wednesday of the month
Come take a copy, it’s free!www.graphdatabases.com
Questions?
Mark [email protected]@neotechnology.com