intro to cypher for the sql developer
TRANSCRIPT
Cypher for SQL Developers
Mark Needham @markhneedham [email protected]
Talk structure
‣ Introduce data set‣ Modeling‣ Import‣ Data Integrity‣ Queries‣ Migration/Refactoring‣ Query optimisation
Introducing our data set...
Exploring transfermarkt
Exploring transfermarkt
|---------+--------------------+-----------------------------------------+--------------------+------------|| season | playerName | playerUri | playerPosition | playerAge ||---------+--------------------+-----------------------------------------+--------------------+------------|| 90/91 | Aldair | /aldair/profil/spieler/4151 | Centre Back | 24 || 90/91 | Thomas Häßler | /thomas-hassler/profil/spieler/553 | Attacking Midfield | 24 || 90/91 | Roberto Baggio | /roberto-baggio/profil/spieler/4153 | Secondary Striker | 23 || 90/91 | Karl-Heinz Riedle | /karl-heinz-riedle/profil/spieler/13806 | Centre Forward | 24 || 90/91 | Henrik Larsen | /henrik-larsen/profil/spieler/101330 | Attacking Midfield | 24 || 90/91 | Gheorghe Hagi | /gheorghe-hagi/profil/spieler/7939 | Attacking Midfield | 25 || 90/91 | Hristo Stoichkov | /hristo-stoichkov/profil/spieler/7938 | Left Wing | 24 || 90/91 | Brian Laudrup | /brian-laudrup/profil/spieler/39667 | Centre Forward | 21 || 90/91 | Miguel Ángel Nadal | /miguel-angel-nadal/profil/spieler/7676 | Centre Back | 23 ||---------+--------------------+-----------------------------------------+--------------------+------------|
Exploring transfermarkt
|-------------------+---------------------+-------------------------------------+--------------------|| sellerClubName | sellerClubNameShort | sellerClubUri | sellerClubCountry ||-------------------+---------------------+-------------------------------------+--------------------|| SL Benfica | Benfica | /benfica/startseite/verein/294 | Portugal || 1. FC Köln | 1. FC Köln | /1-fc-koln/startseite/verein/3 | Germany || ACF Fiorentina | Fiorentina | /fiorentina/startseite/verein/430 | Italy || SV Werder Bremen | Werder Bremen | /werder-bremen/startseite/verein/86 | Germany || Lyngby BK | Lyngby BK | /lyngby-bk/startseite/verein/369 | Denmark || Steaua Bucharest | Steaua | /steaua/startseite/verein/301 | Romania || CSKA Sofia | CSKA Sofia | /cska-sofia/startseite/verein/208 | Bulgaria || KFC Uerdingen 05 | KFC Uerdingen | /kfc-uerdingen/startseite/verein/95 | Germany || RCD Mallorca | RCD Mallorca | /rcd-mallorca/startseite/verein/237 | Spain ||-------------------+---------------------+-------------------------------------+--------------------|
Exploring transfermarkt
|----------------+--------------------+-------------------------------------+-------------------|| buyerClubName | buyerClubNameShort | buyerClubUri | buyerClubCountry ||----------------+--------------------+-------------------------------------+-------------------|| AS Roma | AS Roma | /as-roma/startseite/verein/12 | Italy || Juventus FC | Juventus | /juventus/startseite/verein/506 | Italy || Juventus FC | Juventus | /juventus/startseite/verein/506 | Italy || SS Lazio | Lazio | /lazio/startseite/verein/398 | Italy || AC Pisa 1909 | AC Pisa | /ac-pisa/startseite/verein/4172 | Italy || Real Madrid | Real Madrid | /real-madrid/startseite/verein/418 | Spain || FC Barcelona | FC Barcelona | /fc-barcelona/startseite/verein/131 | Spain || Bayern Munich | Bayern Munich | /bayern-munich/startseite/verein/27 | Germany || FC Barcelona | FC Barcelona | /fc-barcelona/startseite/verein/131 | Spain ||----------------+--------------------+-------------------------------------+-------------------|
Exploring transfermarkt
|--------------------------------------------------------+-------------+---------------|| transferUri | transferFee | transferRank ||--------------------------------------------------------+-------------+---------------|| /jumplist/transfers/spieler/4151/transfer_id/6993 | £6.75m | 1 || /jumplist/transfers/spieler/553/transfer_id/2405 | £5.85m | 2 || /jumplist/transfers/spieler/4153/transfer_id/84533 | £5.81m | 3 || /jumplist/transfers/spieler/13806/transfer_id/19054 | £5.63m | 4 || /jumplist/transfers/spieler/101330/transfer_id/275067 | £5.03m | 5 || /jumplist/transfers/spieler/7939/transfer_id/19343 | £3.23m | 6 || /jumplist/transfers/spieler/7938/transfer_id/11563 | £2.25m | 7 || /jumplist/transfers/spieler/39667/transfer_id/90285 | £2.25m | 8 || /jumplist/transfers/spieler/7676/transfer_id/11828 | £2.10m | 9 ||--------------------------------------------------------+-------------+---------------|
Relational Model
players
id
name
position
clubs
id
name
country
transfers
id
fee
player_age
player_id
from_club_id
to_club_id
season
Graph model
Nodes
Relationships
Properties
Labels
Relational vs Graph
Records in tables
Nodes"Soft"
relationships computed at query time
"Hard" relationships built into the
data store
Relational Import
Create players table
CREATE TABLE players (
"id" character varying(100)
NOT NULL PRIMARY KEY,
"name" character varying(150) NOT NULL,
"position" character varying(20)
);
Insert players
INSERT INTO players
VALUES('/aldair/profil/spieler/4151', 'Aldair', 'Centre Back');
INSERT INTO players
VALUES('/thomas-hassler/profil/spieler/553', 'Thomas Häßler',
'Attacking Midfield');
INSERT INTO players VALUES('/roberto-
baggio/profil/spieler/4153', 'Roberto Baggio', 'Secondary
Striker');
Create clubs table
CREATE TABLE clubs (
"id" character varying(100)
NOT NULL PRIMARY KEY,
"name" character varying(50) NOT NULL,
"country" character varying(50)
);
Insert clubs
INSERT INTO clubs VALUES('/hertha-bsc/startseite/verein/44',
'Hertha BSC', 'Germany');
INSERT INTO clubs VALUES('/cfr-cluj/startseite/verein/7769',
'CFR Cluj', 'Romania');
INSERT INTO clubs VALUES('/real-sociedad/startseite/verein/681',
'Real Sociedad', 'Spain');
Create transfers table
CREATE TABLE transfers (
"id" character varying(100) NOT NULL PRIMARY KEY,
"fee" character varying(50) NOT NULL,
"numericFee" integer NOT NULL,
"player_age" smallint NOT NULL,
"season" character varying(5) NOT NULL,
"player_id" character varying(100) NOT NULL REFERENCES players (id),
"from_club_id" character varying(100) NOT NULL REFERENCES clubs (id),
"to_club_id" character varying(100) NOT NULL REFERENCES clubs (id)
);
Insert transfers
INSERT INTO transfers VALUES('/jumplist/transfers/spieler/4151/transfer_id/6993',
'£6.75m', 6750000, '90/91', 24, '/aldair/profil/spieler/4151',
'/benfica/startseite/verein/294', '/as-roma/startseite/verein/12');
INSERT INTO transfers VALUES('/jumplist/transfers/spieler/553/transfer_id/2405',
'£5.85m', 5850000, '90/91', 24, '/thomas-hassler/profil/spieler/553', '/1-fc-
koln/startseite/verein/3', '/juventus/startseite/verein/506');
INSERT INTO transfers VALUES('/jumplist/transfers/spieler/4153/transfer_id/84533',
'£5.81m', 5810000, '90/91', 23, '/roberto-baggio/profil/spieler/4153',
'/fiorentina/startseite/verein/430', '/juventus/startseite/verein/506');
Graph Import
LOAD CSV
‣ Tool for importing CSV files‣ Intended for data sets of ~10M records‣ Works against live database‣ Use Cypher constructs to define graph
LOAD CSV
[USING PERIODIC COMMIT [1000]]
LOAD CSV WITH HEADERS FROM "(file|http)://" AS row
MATCH (:Label {property: row.header})
CREATE (:Label {property: row.header})
MERGE (:Label {property: row.header})
LOAD CSV
[USING PERIODIC COMMIT [1000]]
LOAD CSV WITH HEADERS FROM "(file|http)://" AS row
MATCH (:Label {property: row.header})
CREATE (:Label {property: row.header})
MERGE (:Label {property: row.header})
LOAD CSV
[USING PERIODIC COMMIT [1000]]
LOAD CSV WITH HEADERS FROM "(file|http)://" AS row
MATCH (:Label {property: row.header})
CREATE (:Label {property: row.header})
MERGE (:Label {property: row.header})
LOAD CSV
[USING PERIODIC COMMIT [1000]]
LOAD CSV WITH HEADERS FROM "(file|http)://" AS row
MATCH (:Label {property: row.header})
CREATE (:Label {property: row.header})
MERGE (:Label {property: row.header})
LOAD CSV
[USING PERIODIC COMMIT [1000]]
LOAD CSV WITH HEADERS FROM "(file|http)://" AS row
MATCH (:Label {property: row.header})
CREATE (:Label {property: row.header})
MERGE (:Label {property: row.header})
LOAD CSV
[USING PERIODIC COMMIT [1000]]
LOAD CSV WITH HEADERS FROM "(file|http)://" AS row
MATCH (:Label {property: row.header})
CREATE (:Label {property: row.header})
MERGE (:Label {property: row.header})
Exploring the data
LOAD CSV WITH HEADERS
FROM "file:///transfers.csv"
AS row
RETURN COUNT(*)
Exploring the data
LOAD CSV WITH HEADERS
FROM "file:///transfers.csv"
AS row
RETURN COUNT(*)
Exploring the data
LOAD CSV WITH HEADERS
FROM "file:///transfers.csv"
AS row
RETURN row
LIMIT 1
Exploring the data
LOAD CSV WITH HEADERS
FROM "file:///transfers.csv"
AS row
RETURN row
LIMIT 1
Import players
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///transfers.csv" AS row
CREATE (player:Player {
id: row.playerUri,
name: row.playerName,
position: row.playerPosition
})
Import players
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///transfers.csv" AS row
CREATE (player:Player {
id: row.playerUri,
name: row.playerName,
position: row.playerPosition
})
Not so fast!
Ensure uniqueness of players
CREATE CONSTRAINT ON (player:Player)
ASSERT player.id IS UNIQUE
Import players
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///transfers.csv" AS row
CREATE (player:Player {
id: row.playerUri,
name: row.playerName,
position: row.playerPosition
})
Node 25 already exists with label Player and property "id"=[/peter-
lux/profil/spieler/84682]
Import players
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///transfers.csv" AS row
MERGE (player:Player {id: row.playerUri})
ON CREATE SET player.name = row.playerName,
player.position = row.playerPosition
Import clubs
CREATE CONSTRAINT ON (club:Club)
ASSERT club.id IS UNIQUE
Import selling clubs
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///transfers.csv" AS row
MERGE (club:Club {id: row.sellerClubUri})
ON CREATE SET club.name = row.sellerClubName,
club.country = row.sellerClubCountry
Import buying clubs
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///transfers.csv" AS row
MERGE (club:Club {id: row.buyerClubUri})
ON CREATE SET club.name = row.buyerClubName,
club.country = row.buyerClubCountry
Import transfers
CREATE CONSTRAINT ON (transfer:Transfer)
ASSERT transfer.id IS UNIQUE
Import transfers
LOAD CSV WITH HEADERS FROM "file:///transfers.csv" AS row
MATCH (player:Player {id: row.playerUri})
MATCH (source:Club {id: row.sellerClubUri})
MATCH (destination:Club {id: row.buyerClubUri})
MERGE (t:Transfer {id: row.transferUri})
ON CREATE SET t.season = row.season, t.rank = row.transferRank,
t.fee = row.transferFee
MERGE (t)-[:OF_PLAYER { age: row.playerAge }]->(player)
MERGE (t)-[:FROM_CLUB]->(source)
MERGE (t)-[:TO_CLUB]->(destination)
Schema
Optional Schema
‣ Unique node property constraint
Optional Schema
‣ Unique node property constraintCREATE CONSTRAINT ON (club:Club)
ASSERT club.id IS UNIQUE
Optional Schema
‣ Unique node property constraint‣ Node property existence constraint
Optional Schema
‣ Unique node property constraint‣ Node property existence constraintCREATE CONSTRAINT ON (club:Club)
ASSERT EXISTS(club.name)
Optional Schema
‣ Unique node property constraint‣ Node property existence constraint‣ Relationship property existence constraint
Optional Schema
‣ Unique node property constraint‣ Node property existence constraint‣ Relationship property existence constraintCREATE CONSTRAINT ON ()-[player:OF_PLAYER]-()
ASSERT exists(player.age)
SQL vs Cypher
Find player by name
SELECT *
FROM players
WHERE players.name = 'Cristiano Ronaldo'
SELECT *
FROM players
WHERE players.name = 'Cristiano Ronaldo'
MATCH (player:Player { name: "Cristiano Ronaldo" })
RETURN player
SELECT *
FROM players
WHERE players.name = 'Cristiano Ronaldo'
MATCH (player:Player { name: "Cristiano Ronaldo" })
RETURN player
SELECT *
FROM players
WHERE players.name = 'Cristiano Ronaldo'
MATCH (player:Player { name: "Cristiano Ronaldo" })
RETURN player
Find transfers between clubs
SELECT players.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
WHERE clubFrom.name = 'Tottenham Hotspur'
AND clubTo.name = 'Manchester United'
SELECT players.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
WHERE clubFrom.name = 'Tottenham Hotspur'
AND clubTo.name = 'Manchester United'
MATCH (from:Club)<-[:FROM_CLUB]-(transfer:Transfer)-[:TO_CLUB]->(to:Club),
(transfer)-[:OF_PLAYER]->(player)
WHERE from.name = "Tottenham Hotspur" AND to.name = "Manchester United"
RETURN player.name, transfer.numericFee, transfer.season
SELECT players.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
WHERE clubFrom.name = 'Tottenham Hotspur'
AND clubTo.name = 'Manchester United'
MATCH (from:Club)<-[:FROM_CLUB]-(transfer:Transfer)-[:TO_CLUB]->(to:Club),
(transfer)-[:OF_PLAYER]->(player)
WHERE from.name = "Tottenham Hotspur" AND to.name = "Manchester United"
RETURN player.name, transfer.numericFee, transfer.season
How does Neo4j use indexes?
Indexes are only used to find the starting point for queries.
Use index scans to look up rows in tables and join them with rows from other tables
Use indexes to find the starting points for a query.
Relational
Graph
How does Neo4j use indexes?
Migrating/refactoring the model
Player nationality
|------------------------------------------+--------------------+--------------------|
| playerUri | playerName | playerNationality |
|------------------------------------------+--------------------+--------------------|
| /aldair/profil/spieler/4151 | Aldair | Brazil |
| /thomas-hassler/profil/spieler/553 | Thomas Häßler | Germany |
| /roberto-baggio/profil/spieler/4153 | Roberto Baggio | Italy |
| /karl-heinz-riedle/profil/spieler/13806 | Karl-Heinz Riedle | Germany |
| /henrik-larsen/profil/spieler/101330 | Henrik Larsen | Denmark |
| /gheorghe-hagi/profil/spieler/7939 | Gheorghe Hagi | Romania |
| /hristo-stoichkov/profil/spieler/7938 | Hristo Stoichkov | Bulgaria |
| /brian-laudrup/profil/spieler/39667 | Brian Laudrup | Denmark |
| /miguel-angel-nadal/profil/spieler/7676 | Miguel Ángel Nadal | Spain |
|------------------------------------------+--------------------+--------------------|
Relational migration
Relational Model
players
id
name
position
nationality
clubs
id
name
country
transfers
id
fee
player_age
player_id
from_club_id
to_club_id
season
Add column to players table
ALTER TABLE players
ADD COLUMN nationality varying(30);
Update players table
UPDATE players
SET nationality = 'Brazil'
WHERE players.id = '/aldair/profil/spieler/4151';
UPDATE players
SET nationality = 'Germany'
WHERE players.id ='/ulf-kirsten/profil/spieler/74';
UPDATE players
SET nationality = 'England'
WHERE players.id ='/john-lukic/profil/spieler/28241';
Graph refactoring
Graph model
Add property to player nodes
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///transfers.csv" AS row
MATCH (player:Player {id: row.playerUri})
SET player.nationality = row.playerNationality
Find transfers of English players
SELECT players.name, clubFrom.name, clubTo.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
WHERE clubFrom.country = 'England' AND clubTo.country = 'England'
AND players.nationality = 'England'
ORDER BY t."numericFee" DESC
LIMIT 10
SELECT players.name, clubFrom.name, clubTo.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
WHERE clubFrom.country = 'England' AND clubTo.country = 'England'
AND players.nationality = 'England'
ORDER BY t."numericFee" DESC
LIMIT 10
MATCH (to:Club)<-[:TO_CLUB]-(t:Transfer)-[:FROM_CLUB]-(from:Club),
(t)-[:OF_PLAYER]->(player:Player)
WHERE to.country = "England" AND from.country = "England"
AND player.nationality = "England"
RETURN player.name, from.name, to.name, t.numericFee, t.season
ORDER BY t.numericFee DESC
LIMIT 10
SELECT players.name, clubFrom.name, clubTo.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
WHERE clubFrom.country = 'England' AND clubTo.country = 'England'
AND players.nationality = 'England'
ORDER BY t."numericFee" DESC
LIMIT 10
MATCH (to:Club)<-[:TO_CLUB]-(t:Transfer)-[:FROM_CLUB]-(from:Club),
(t)-[:OF_PLAYER]->(player:Player)
WHERE to.country = "England" AND from.country = "England"
AND player.nationality = "England"
RETURN player.name, from.name, to.name, t.numericFee, t.season
ORDER BY t.numericFee DESC
LIMIT 10
Countries and confederations|----------------------+----------------|| country | confederation ||----------------------+----------------|| Afghanistan | afc || Albania | uefa || Algeria | caf || American Samoa | ofc || Andorra | uefa || Angola | caf || Anguilla | concacaf || Antigua and Barbuda | concacaf || Argentina | conmebol ||----------------------+----------------|
|-----------+-----------+-------------------------------------------------|| urlName | shortName | region ||-----------+-----------+-------------------------------------------------|| afc | AFC | Asia || uefa | UEFA | Europe || ofc | OFC | Oceania || conmebol | CONMEBOL | South America || concacaf | CONCACAF | North American, Central American and Caribbean || caf | CAF | Africa ||-----------+-----------+-------------------------------------------------|
Relational migration
Relational Model
players
id
name
position
country_id
clubs
id
name
country_id
transfers
id
fee
player_age
player_id
from_club_id
to_club_id
season
countries
id
name
confederation_id
confederations
id
shortName
name
region
Create confederations table
CREATE TABLE confederations (
"id" character varying(10)
NOT NULL PRIMARY KEY,
"shortName" character varying(50) NOT NULL,
"name" character varying(100) NOT NULL,
"region" character varying(100) NOT NULL
);
Populate confederations
INSERT INTO confederations VALUES('afc', 'AFC', 'Asian Football
Confederation', 'Asia');
INSERT INTO confederations VALUES('uefa', 'UEFA', 'Union of European
Football Associations', 'Europe');
INSERT INTO confederations VALUES('ofc', 'OFC', 'Oceania Football
Confederation', 'Oceania');
Create countries table
CREATE TABLE countries (
"code" character varying(3)
NOT NULL PRIMARY KEY,
"name" character varying(50)
NOT NULL,
"federation" character varying(10) NOT NULL
REFERENCES confederations (id)
);
Populate countries
INSERT INTO countries VALUES('MNE', 'Montenegro', 'uefa');
INSERT INTO countries VALUES('LTU', 'Lithuania', 'uefa');
INSERT INTO countries VALUES('CAM', 'Cambodia', 'afc');
INSERT INTO countries VALUES('SUI', 'Switzerland', 'uefa');
INSERT INTO countries VALUES('ETH', 'Ethiopia', 'caf');
INSERT INTO countries VALUES('ARU', 'Aruba', 'concacaf');
INSERT INTO countries VALUES('SWZ', 'Swaziland', 'caf');
INSERT INTO countries VALUES('PLE', 'Palestine', 'afc');
Add column to clubs table
ALTER TABLE clubs
ADD COLUMN country_id character varying(3)
REFERENCES countries(code);
Update clubs
UPDATE clubs AS cl
SET country_id = c.code
FROM clubs
INNER JOIN countries AS c
ON c.name = clubs.country
WHERE cl.id = clubs.id;
Update clubs
# select * from clubs limit 5;
id | name | country | country_id
----------------------------------------+-----------------------------+---------------+------------
/san-jose-clash/startseite/verein/4942 | San Jose Clash | United States | USA
/chicago/startseite/verein/432 | Chicago Fire | United States | USA
/gz-evergrande/startseite/verein/10948 | Guangzhou Evergrande Taobao | China | CHN
/as-vita-club/startseite/verein/2225 | AS Vita Club Kinshasa | Congo DR | CGO
/vicenza/startseite/verein/2655 | Vicenza Calcio | Italy | ITA
(6 rows)
Remove country
ALTER TABLE clubs
DROP COLUMN country;
Add column to players table
ALTER TABLE players
ADD COLUMN country_id character varying(3)
REFERENCES countries(code);
Update players
UPDATE players AS p
SET country_id = c.code
FROM players
INNER JOIN countries AS c
ON c.name = players.nationality
WHERE p.id = players.id;
Update players
# select * from players limit 5;
id | name | position | nationality | country_id
-----------------------------------------+-------------------+--------------------+-------------+------------
/dalian-atkinson/profil/spieler/200738 | Dalian Atkinson | Attacking Midfield | England | ENG
/steve-redmond/profil/spieler/177056 | Steve Redmond | Centre Back | England | ENG
/bert-konterman/profil/spieler/6252 | Bert Konterman | Centre Back | Netherlands | NED
/lee-philpott/profil/spieler/228030 | Lee Philpott | Midfield | England | ENG
/tomasz-frankowski/profil/spieler/14911 | Tomasz Frankowski | Centre Forward | Poland | POL
(5 rows)
Remove nationality
ALTER TABLE players
DROP COLUMN nationality;
Graph refactoring
Graph model
Import confederations
LOAD CSV WITH HEADERS
FROM "file:///confederations.csv" AS row
MERGE (c:Confederation {id: row.urlName})
ON CREATE
SET c.shortName = row.shortName,
c.region = row.region,
c.name = row.name
Import countries
LOAD CSV WITH HEADERS FROM "file:///countries.csv"
AS row
MERGE (country:Country {id: row.countryCode})
ON CREATE SET country.name = row.country
WITH country, row
MATCH (conf:Confederation {id: row.confederation })
MERGE (country)-[:PART_OF]->(conf)
Refactor clubs
MATCH (club:Club)
MATCH (country:Country {name: club.country})
MERGE (club)-[:PART_OF]->(country)
REMOVE club.country
Refactor players
MATCH (player:Player)
MATCH (country:Country {name: player.nationality})
MERGE (player)-[:PLAYS_FOR]->(country)
REMOVE player.nationality
Recap: Find transfers of English players
SELECT players.name, clubFrom.name, clubTo.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
WHERE clubFrom.country = 'England' AND clubTo.country = 'England'
AND players.nationality = 'England'
ORDER BY t."numericFee" DESC
LIMIT 10
MATCH (to:Club)<-[:TO_CLUB]-(t:Transfer)-[:FROM_CLUB]-(from:Club),
(t)-[:OF_PLAYER]->(player:Player)
WHERE to.country = "England" AND from.country = "England"
AND player.nationality = "England"
RETURN player.name, from.name, to.name, t.numericFee, t.season
ORDER BY t.numericFee DESC
LIMIT 10
SELECT players.name, clubFrom.name, clubTo.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
JOIN countries AS fromCount ON clubFrom.country_id = fromCount.code
JOIN countries AS toCount ON clubTo.country_id = toCount.code
JOIN countries AS playerCount ON players.country_id = playerCount.code
WHERE fromCount.name = 'England' AND toCount.name = 'England' AND playerCount.name = 'England'
ORDER BY t."numericFee" DESC
LIMIT 10
MATCH (to:Club)<-[:TO_CLUB]-(t:Transfer)-[:FROM_CLUB]-(from:Club),
(t)-[:OF_PLAYER]->(player:Player)-[:PLAYS_FOR]->(country:Country),
(to)-[:PART_OF]->(country)<-[:PART_OF]-(from)
WHERE country.name = "England"
RETURN player.name, from.name, to.name, t.numericFee, t.season
ORDER BY t.numericFee DESC
LIMIT 10
SELECT players.name, clubFrom.name, clubTo.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
JOIN countries AS fromCount ON clubFrom.country_id = fromCount.code
JOIN countries AS toCount ON clubTo.country_id = toCount.code
JOIN countries AS playerCount ON players.country_id = playerCount.code
WHERE fromCount.name = 'England' AND toCount.name = 'England' AND playerCount.name = 'England'
ORDER BY t."numericFee" DESC
LIMIT 10
MATCH (to:Club)<-[:TO_CLUB]-(t:Transfer)-[:FROM_CLUB]-(from:Club),
(t)-[:OF_PLAYER]->(player:Player)-[:PLAYS_FOR]->(country:Country),
(to)-[:PART_OF]->(country)<-[:PART_OF]-(from)
WHERE country.name = "England"
RETURN player.name, from.name, to.name, t.numericFee, t.season
ORDER BY t.numericFee DESC
LIMIT 10
SELECT players.name, clubFrom.name, clubTo.name, t."numericFee", t.season
FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
JOIN countries AS fromCount ON clubFrom.country_id = fromCount.code
JOIN countries AS toCount ON clubTo.country_id = toCount.code
JOIN countries AS playerCount ON players.country_id = playerCount.code
WHERE fromCount.name = 'England' AND toCount.name = 'England' AND playerCount.name = 'England'
ORDER BY t."numericFee" DESC
LIMIT 10
MATCH (to:Club)<-[:TO_CLUB]-(t:Transfer)-[:FROM_CLUB]-(from:Club),
(t)-[:OF_PLAYER]->(player:Player)-[:PLAYS_FOR]->(country:Country),
(to)-[:PART_OF]->(country)<-[:PART_OF]-(from)
WHERE country.name = "England"
RETURN player.name, from.name, to.name, t.numericFee, t.season
ORDER BY t.numericFee DESC
LIMIT 10
Find transfers between different confederations
SELECT * FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
JOIN countries AS fromCountry ON clubFrom.country_id = fromCountry.code
JOIN countries AS toCountry ON clubTo.country_id = toCountry.code
JOIN confederations AS fromConfederation ON fromCountry.federation = fromConfederation.id
JOIN confederations AS toConfederation ON toCountry.federation = toConfederation.id
WHERE fromConfederation.id = 'afc' AND toConfederation.id = 'uefa'
ORDER BY t."numericFee" DESC
LIMIT 10
SELECT * FROM transfers AS t
JOIN clubs AS clubFrom ON t.from_club_id = clubFrom.id
JOIN clubs AS clubTo ON t.to_club_id = clubTo.id
JOIN players ON t.player_id = players.id
JOIN countries AS fromCountry ON clubFrom.country_id = fromCountry.code
JOIN countries AS toCountry ON clubTo.country_id = toCountry.code
JOIN confederations AS fromConfederation ON fromCountry.federation = fromConfederation.id
JOIN confederations AS toConfederation ON toCountry.federation = toConfederation.id
WHERE fromConfederation.id = 'afc' AND toConfederation.id = 'uefa'
ORDER BY t."numericFee" DESC
LIMIT 10
MATCH (to:Club)<-[:TO_CLUB]-(t:Transfer)-[:FROM_CLUB]-(from:Club),
(t)-[:OF_PLAYER]->(player:Player),
(from)-[:PART_OF*2]->(:Confederation {id: "afc"}),
(to)-[:PART_OF*2]->(:Confederation {id: "uefa"})
RETURN player.name, from.name, to.name, t.numericFee, t.season
ORDER BY t.numericFee DESC
LIMIT 10
What’s in my database?
Tables
# \dt List of relations
Schema | Name | Type | Owner
--------+----------------+-------+-------------
public | clubs | table | markneedham
public | confederations | table | markneedham
public | countries | table | markneedham
public | players | table | markneedham
public | transfers | table | markneedham
(5 rows)
Node labels
CALL db.labels()+=============+
|label |
+=============+
|Player |
+-------------+
|Club |
+-------------+
|Transfer |
+-------------+
|Loan |
+-------------+
|Confederation|
+-------------+
|Country |
+-------------+
Node labels
Table schema
# \d+ countries Table "public.countries"
Column | Type | Modifiers | Storage | Stats target | Description
------------+-----------------------+-----------+----------+--------------+-------------
code | character varying(3) | not null | extended | |
name | character varying(50) | not null | extended | |
federation | character varying(10) | not null | extended | |
Indexes:
"pk_countries" PRIMARY KEY, btree (code)
Foreign-key constraints:
"countries_federation_fkey" FOREIGN KEY (federation) REFERENCES confederations(id)
Referenced by:
TABLE "players" CONSTRAINT "playersfk" FOREIGN KEY (country_id) REFERENCES countries(code) MATCH FULL
:schema
Indexes
ON :Club(name) ONLINE
ON :Club(id) ONLINE (for uniqueness constraint)
ON :Player(name) ONLINE
ON :Player(id) ONLINE (for uniqueness constraint)
Constraints
ON (player:Player) ASSERT player.id IS UNIQUE
ON (club:Club) ASSERT exists(club.name)
ON (club:Club) ASSERT club.id IS UNIQUE
ON ()-[of_player:OF_PLAYER]-() ASSERT exists(of_player.age)
Graph schema
MATCH (country:Country)
RETURN keys(country), COUNT(*) AS times+-----------------------+
| keys(country) | times |
+-----------------------+
| ["id","name"] | 198 |
+-----------------------+
Graph schema
Graph schema
MATCH (club:Club)
RETURN keys(club), COUNT(*) AS times+---------------------------------+
| keys(club) | times |
+---------------------------------+
| ["id","name"] | 806 |
| ["name","country","id"] | 1 |
+---------------------------------+
Entity/Relationship diagram
Meta graph
Meta graph
MATCH (a)-[r]->(b) WITH head(labels(a)) AS l, head(labels(b)) AS l2, type(r) AS rel_type, count(*) as count CALL apoc.create.vNode([l],{name:l}) yield node as a CALL apoc.create.vNode([l2],{name:l2}) yield node as b CALL apoc.create.vRelationship(a,rel_type,{name:rel_type, count:count},b) YIELD rel RETURN *;
Data Integrity
Clubs without country
# SELECT * FROM clubs where country_id is null; id | name | country | country_id
---------------------------------------+-------------------------+---------------+------------
/unknown/startseite/verein/75 | Unknown | |
/pohang/startseite/verein/311 | Pohang Steelers | Korea, South |
/bluewings/startseite/verein/3301 | Suwon Samsung Bluewings | Korea, South |
/ulsan/startseite/verein/3535 | Ulsan Hyundai | Korea, South |
/africa-sports/startseite/verein/2936 | Africa Sports | Cote d'Ivoire |
/monaco/startseite/verein/162 | AS Monaco | Monaco |
/jeonbuk/startseite/verein/6502 | Jeonbuk Hyundai Motors | Korea, South |
/busan/startseite/verein/2582 | Busan IPark | Korea, South |
(8 rows)
Clubs without country
MATCH (club:Club)
WHERE NOT (club)-[:PART_OF]->()
RETURN club+=====================================================================+
|club |
+=====================================================================+
|{name: Unknown, id: /unknown/startseite/verein/75} |
+---------------------------------------------------------------------+
|{country: Monaco, name: AS Monaco, id: /monaco/startseite/verein/162}|
+---------------------------------------------------------------------+
Deleting data - SQL
# drop table countries;
ERROR: cannot drop table countries because other objects depend on
it
DETAIL: constraint playersfk on table players depends on table
countries
HINT: Use DROP ... CASCADE to drop the dependent objects too.
MATCH (country:Country)
DELETE country
org.neo4j.kernel.api.exceptions.TransactionFailureException: Node
record Node[11306,used=false,rel=24095,prop=-1,labels=Inline(0x0:
[]),light] still has relationships
Deleting data - Cypher
MATCH (country:Country)
DETACH DELETE country
Deleted 198 nodes, deleted 5071 relationships, statement executed
in 498 ms.
Deleting data - Cypher
Query Optimisation
Optimising queries
‣ Use EXPLAIN/PROFILE to see what your queries are doing under the covers
‣ Index the starting points of queries‣ Reduce work in progress of intermediate
parts of the query where possible‣ Look at the warnings in the Neo4j browser -
they are often helpful!
Optimising queries - useful links
‣ Tuning Your Cypherhttps://www.youtube.com/watch?v=tYtyoYcd_e8
‣ Neo4j 2.2 Query Tuninghttp://neo4j.com/blog/neo4j-2-2-query-tuning/
‣ Ask for help on Stack Overflow/Neo4j Slackhttp://neo4j-users-slack-invite.herokuapp.com
One more thing...
‣ New in Neo4j 3.0.0!
Procedures
‣ New in Neo4j 3.0.0!‣ We’ve already seen an example!
CALL db.labels()
‣ Michael Hunger has created a set of procedures (APOC) at:https://github.com/jexp/neo4j-apoc-procedures
Procedures
WITH "https://api.github.com/search/repositories?q=neo4j"
AS githubUri
CALL apoc.load.json(githubUri)
YIELD value AS document
UNWIND document.items AS item
RETURN item.full_name, item.watchers_count, item.forks
ORDER BY item.forks DESC
Querying github
WITH "https://api.github.com/search/repositories?q=neo4j"
AS githubUri
CALL apoc.load.json(githubUri)
YIELD value AS document
UNWIND document.items AS item
RETURN item.full_name, item.watchers_count, item.forks
ORDER BY item.forks DESC
Querying github
+------------------------------------------------------------------------+
| item.full_name | item.watchers_count | item.forks |
+------------------------------------------------------------------------+
| "neo4j/neo4j" | 2472 | 872 |
| "spring-projects/spring-data-neo4j" | 403 | 476 |
| "neo4j-contrib/developer-resources" | 106 | 295 |
| "neo4jrb/neo4j" | 1014 | 190 |
| "jadell/neo4jphp" | 507 | 140 |
| "thingdom/node-neo4j" | 780 | 127 |
| "aseemk/node-neo4j-template" | 176 | 91 |
| "jimwebber/neo4j-tutorial" | 268 | 87 |
| "rickardoberg/neo4j-jdbc" | 33 | 68 |
| "FaKod/neo4j-scala" | 194 | 64 |
+------------------------------------------------------------------------+
Querying github
Questions? :-)
Mark [email protected] @markhneedham
https://github.com/neo4j-meetups/cypher-for-sql-developers