tips and tricks for graph data modeling

80
Graph Modeling Tips and Tricks Mark Needham @markhneedham

Upload: neo4j-the-fastest-and-most-scalable-native-graph-database

Post on 22-Jan-2017

1.007 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Tips and Tricks for Graph Data Modeling

Graph ModelingTips and Tricks

Mark Needham @markhneedham

Page 2: Tips and Tricks for Graph Data Modeling

Introducing our data set...

Page 3: Tips and Tricks for Graph Data Modeling

meetup.com’s recommendations

Page 4: Tips and Tricks for Graph Data Modeling

Recommendation queries

‣ Several different types• groups to join• topics to follow• events to attend

‣ As a user of meetup.com trying to find groups to join and events to attend

Page 5: Tips and Tricks for Graph Data Modeling

How will this talk be structured?

Page 6: Tips and Tricks for Graph Data Modeling

Find similar groups to Neo4j

As a member of the Neo4j London group

I want to find other similar meetup groups

So that I can join those groups

Page 7: Tips and Tricks for Graph Data Modeling

What makes groups similar?

Page 8: Tips and Tricks for Graph Data Modeling

As a member of the Neo4j London group

I want to find other similar meetup groups

So that I can join those groups

Find similar groups to Neo4j

Page 9: Tips and Tricks for Graph Data Modeling

As a member of the Neo4j London group

I want to find other similar meetup groups

So that I can join those groups

Nodes

Page 10: Tips and Tricks for Graph Data Modeling

As a member of the Neo4j London group

I want to find other similar meetup groups

So that I can join those groups

Relationships

Page 11: Tips and Tricks for Graph Data Modeling

As a member of the Neo4j London group

I want to find other similar meetup groups

So that I can join those groups

Labels

Page 12: Tips and Tricks for Graph Data Modeling

As a member of the Neo4j London group

I want to find other similar meetup groups

So that I can join those groups

Properties

Page 13: Tips and Tricks for Graph Data Modeling

Find similar groups to Neo4j

MATCH (group:Group {name: "Neo4j - London User Group"}) -[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) AS topicsORDER BY topicsInCommon DESC, otherGroup.nameLIMIT 10

Page 14: Tips and Tricks for Graph Data Modeling

Find similar groups to Neo4j

Page 15: Tips and Tricks for Graph Data Modeling

Tip: Model incrementally

‣ Build the model for the question we need to answer now then move onto the next question

Page 16: Tips and Tricks for Graph Data Modeling

I’m already a member of these!

Page 17: Tips and Tricks for Graph Data Modeling

What other data can we get?

Page 18: Tips and Tricks for Graph Data Modeling

Exclude groups I’m a member of

As a member of the Neo4j London group

I want to find other similar meetup groupsthat I’m not already a member of

So that I can join those groups

Page 19: Tips and Tricks for Graph Data Modeling

Exclude groups I’m a member of

As a member of the Neo4j London group

I want to find other similar meetup groupsthat I’m not already a member of

So that I can join those groups

Page 20: Tips and Tricks for Graph Data Modeling

Exclude groups I’m a member of

MATCH (group:Group {name: "Neo4j - London User Group"}) -[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group)RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, EXISTS((:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(otherGroup)) AS alreadyMember, COLLECT(topic.name) AS topicsORDER BY topicsInCommon DESCLIMIT 10

Page 21: Tips and Tricks for Graph Data Modeling

Exclude groups I’m a member of

Page 22: Tips and Tricks for Graph Data Modeling

Exclude groups I’m a member of

MATCH (group:Group {name: "Neo4j - London User Group"}) -[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group)WHERE NOT( (:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(otherGroup) )RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) AS topicsORDER BY topicsInCommon DESCLIMIT 10

Page 23: Tips and Tricks for Graph Data Modeling

Exclude groups I’m a member of

Page 24: Tips and Tricks for Graph Data Modeling

Find my similar groups

As a member of several meetup groups

I want to find other similar meetup groups

that I’m not already a member of

So that I can join those groups

Page 25: Tips and Tricks for Graph Data Modeling

Find my similar groups

As a member of several meetup groups

I want to find other similar meetup groups

that I’m not already a member of

So that I can join those groups

Page 26: Tips and Tricks for Graph Data Modeling

Find my similar groups

MATCH (member:Member {name: "Mark Needham"}) -[:INTERESTED_IN]->(topic), (member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)

WITH member, topic, COUNT(*) AS scoreMATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as scoreORDER BY score DESC

Page 27: Tips and Tricks for Graph Data Modeling

Find my similar groups

Page 28: Tips and Tricks for Graph Data Modeling

Find Jonny’s similar groups

Page 29: Tips and Tricks for Graph Data Modeling

Oops...Jonny has no interests!

Page 30: Tips and Tricks for Graph Data Modeling

What is Jonny interested in?

As a member of several meetup groups

I want to find other similar meetup groups

that I’m not already a member of

So that I can join those groups

INTERESTED_IN?

Page 31: Tips and Tricks for Graph Data Modeling

What is Jonny interested in?

There’s an implicit INTERESTED_IN relationship between the topics of groups I belong to but don’t express an interest in. Let’s make it explicit

P

G

T

MEMBER_OF

HAS_TOPIC

P

G

T

MEMBER_OF

HAS_TOPIC

INTERESTED_IN

Page 32: Tips and Tricks for Graph Data Modeling

What is Jonny interested in?

MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)

WITH m, topic, COUNT(*) AS times WHERE times > 3

MERGE (m)-[:INTERESTED_IN]->(topic)

Page 33: Tips and Tricks for Graph Data Modeling

What is Jonny interested in?

Page 34: Tips and Tricks for Graph Data Modeling

Tip: Make the implicit explicit

‣ Fill in the missing links in the graph‣ You could run this type of query once a day

during a quiet period‣ On bigger graphs we’d run it in

batches to avoid loading thewhole database into memory

Page 35: Tips and Tricks for Graph Data Modeling

Find next group people join

As a member of a meetup group

I want to find out which meetup groups other people join after this one

So that I can join those groups

Page 36: Tips and Tricks for Graph Data Modeling

Find next group people join

MATCH (group:Group {name: "Neo4j - London User Group"}) <-[membership:MEMBER_OF]-(member), (member)-[otherMembership:MEMBER_OF]->(otherGroup)WHERE membership.joined < otherMembership.joinedWITH member, otherGroup ORDER BY otherMembership.joinedWITH member, COLLECT(otherGroup)[0] AS nextGroupRETURN nextGroup.name, COUNT(*) AS timesORDER BY times DESC

Page 37: Tips and Tricks for Graph Data Modeling

Find next group people join

Page 38: Tips and Tricks for Graph Data Modeling

It feels a bit clunky...

MATCH (group:Group {name: "Neo4j - London User Group"}) <-[membership:MEMBER_OF]-(member), (member)-[otherMembership:MEMBER_OF]->(otherGroup)WHERE membership.joined < otherMembership.joinedWITH member, otherGroup ORDER BY otherMembership.joinedWITH member, COLLECT(otherGroup)[0] AS nextGroupRETURN nextGroup.name, COUNT(*) AS timesORDER BY times DESC

‣ We have to scan through all the MEMBER_OF relationships to find the one we want

‣ It might make our lives easier if we made membership a first class citizen of the domain

Page 39: Tips and Tricks for Graph Data Modeling

Facts can become nodes

Page 40: Tips and Tricks for Graph Data Modeling

Facts can become nodes

Page 41: Tips and Tricks for Graph Data Modeling

Refactor to facts

MATCH (member:Member)-[rel:MEMBER_OF]->(group)

MERGE (membership:Membership {id: member.id + "_" + group.id})SET membership.joined = rel.joined

MERGE (member)-[:HAS_MEMBERSHIP]->(membership)MERGE (membership)-[:OF_GROUP]->(group)

Page 42: Tips and Tricks for Graph Data Modeling

Refactor to facts

MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership)

WITH member, membership ORDER BY member.id, membership.joined

WITH member, COLLECT(membership) AS membershipsUNWIND RANGE(0,SIZE(memberships) - 2) as idx

WITH memberships[idx] AS m1, memberships[idx+1] AS m2MERGE (m1)-[:NEXT]->(m2)

Page 43: Tips and Tricks for Graph Data Modeling

Find next group people join

MATCH (group:Group {name: "Neo4j - London User Group"}) <-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership), (membership)<-[:HAS_MEMBERSHIP]-(member:Member) -[:HAS_MEMBERSHIP]->(nextMembership), (nextMembership)-[:OF_GROUP]->(nextGroup)RETURN nextGroup.name, COUNT(*) AS timesORDER BY times DESC

Page 44: Tips and Tricks for Graph Data Modeling

Comparing the approaches

vs

MATCH (group:Group {name: "Neo4j - London User Group"}) <-[membership:MEMBER_OF]-(member), (member)-[otherMembership:MEMBER_OF]->(otherGroup)WITH member, membership, otherMembership, otherGroup ORDER BY member.id, otherMembership.joinedWHERE membership.joined < otherMembership.joinedWITH member, membership, COLLECT(otherGroup)[0] AS nextGroupRETURN nextGroup.name, COUNT(*) AS timesORDER BY times DESC

MATCH (group:Group {name: "Neo4j - London User Group"}) <-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership), (membership)<-[:HAS_MEMBERSHIP]-(member:Member) -[:HAS_MEMBERSHIP]->(nextMembership), (nextMembership)-[:OF_GROUP]->(nextGroup)RETURN nextGroup.name, COUNT(*) AS timesORDER BY times DESC

Page 45: Tips and Tricks for Graph Data Modeling

How do I profile a query?

‣ EXPLAIN• shows the execution plan without actually

executing it or returning any results.

‣ PROFILE• executes the statement and returns the results

along with profiling information.

45

Page 46: Tips and Tricks for Graph Data Modeling

Neo4j’s longest plan (so far…)

46

Page 47: Tips and Tricks for Graph Data Modeling

Neo4j’s longest plan (so far…)

47

Page 48: Tips and Tricks for Graph Data Modeling

Neo4j’s longest plan (so far…)

48

Page 49: Tips and Tricks for Graph Data Modeling

What is our goal?

At a high level, the goal is simple: get the number of db hits down.

49

Page 50: Tips and Tricks for Graph Data Modeling

an abstract unit of storage engine work.

What is a database hit?

“”

50

Page 51: Tips and Tricks for Graph Data Modeling

Comparing the approaches

Cypher version: CYPHER 2.3, planner: COST. 111656 total db hits in 330 ms.vsCypher version: CYPHER 2.3, planner: COST.23650 total db hits in 60 ms.

Page 52: Tips and Tricks for Graph Data Modeling

Tip: Profile your queries

‣ Spike the different models and see which one performs the best

Page 53: Tips and Tricks for Graph Data Modeling

Should we keep both models?

We could but when we add, edit or remove a membership we’d have to keep both graph structures in sync.

Page 54: Tips and Tricks for Graph Data Modeling

Adding a group membershipWITH "Mark Needham" AS memberName, "Neo4j - London User Group" AS groupName, timestamp() AS now

MATCH (group:Group {name: groupName})MATCH (member:Member {name: memberName})

MERGE (member)-[memberOfRel:MEMBER_OF]->(group)ON CREATE SET memberOfRel.time = now

MERGE (membership:Membership {id: member.id + "_" + group.id})ON CREATE SET membership.joined = nowMERGE (member)-[:HAS_MEMBERSHIP]->(membership)MERGE (membership)-[:OF_GROUP]->(group)

Page 55: Tips and Tricks for Graph Data Modeling

Removing a group membershipWITH "Mark Needham" AS memberName, "Neo4j - London User Group" AS groupName, timestamp() AS now

MATCH (group:Group {name: groupName})MATCH (member:Member {name: memberName})

MATCH (member)-[memberOfRel:MEMBER_OF]->(group)

MATCH (membership:Membership {id: member.id + "_" + group.id})MATCH (member)-[hasMembershipRel:HAS_MEMBERSHIP]->(membership)MATCH (membership)-[ofGroupRel:OF_GROUP]->(group)

DELETE memberOfRel, hasMembershipRel, ofGroupRel, membership

Page 56: Tips and Tricks for Graph Data Modeling

Let’s delete MEMBER_OF then...

Page 57: Tips and Tricks for Graph Data Modeling

...not so fast!

As a member of several meetup groups

I want to find other similar meetup groups

that I’m not already a member of

So that I can join those groups

Page 58: Tips and Tricks for Graph Data Modeling

Why not delete MEMBER_OF?MATCH (member:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)WITH member, topic, COUNT(*) AS scoreMATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as scoreORDER BY score DESC

MATCH (member:Member {name: "Mark Needham"}) -[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic)WITH member, topic, COUNT(*) AS scoreMATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group)RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as scoreORDER BY score DESC

Page 59: Tips and Tricks for Graph Data Modeling

Why not delete MEMBER_OF?MATCH (member:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)WITH member, topic, COUNT(*) AS scoreMATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as scoreORDER BY score DESC

MATCH (member:Member {name: "Mark Needham"}) -[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic)WITH member, topic, COUNT(*) AS scoreMATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group)RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as scoreORDER BY score DESC

433318 total db hits in 485 ms.

83268 total db hits in 117 ms.

Page 60: Tips and Tricks for Graph Data Modeling

Tip: Maintaining multiple models

‣ Different models perform better for different queries but worse for others

‣ Optimising for reads may mean we pay a write and maintenance penalty

Page 61: Tips and Tricks for Graph Data Modeling

What about events?

Page 62: Tips and Tricks for Graph Data Modeling

Events in my groups

As a member of several meetup groups who has previously attended events

I want to find other events hosted by those groups

So that I can attend those events

Page 63: Tips and Tricks for Graph Data Modeling

Events in my groups

As a member of several meetup groups who has previously attended events

I want to find other events

hosted by those groups

So that I can attend those events

Page 64: Tips and Tricks for Graph Data Modeling

WITH 24.0*60*60*1000 AS oneDay

MATCH (member:Member {name: "Mark Needham"}),

(member)-[:MEMBER_OF]->(group),

(group)-[:HOSTED_EVENT]->(futureEvent)

WHERE futureEvent.time >= timestamp()

AND NOT (member)-[:RSVPD]->(futureEvent)

RETURN group.name, futureEvent.name,

round((futureEvent.time - timestamp()) / oneDay) AS days

ORDER BY days

LIMIT 10

Events in my groups

Page 65: Tips and Tricks for Graph Data Modeling

Events in my groups

Page 66: Tips and Tricks for Graph Data Modeling

+ previous events attendedWITH 24.0*60*60*1000 AS oneDay

MATCH (member:Member {name: "Mark Needham"})

MATCH (futureEvent:Event)

WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent)

MATCH (futureEvent)<-[:HOSTED_EVENT]-(group)

WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMember

OPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group)

WHERE pastEvent.time < timestamp()

RETURN group.name,

futureEvent.name,

isMember,

COUNT(rsvp) AS previousEvents,

round((futureEvent.time - timestamp()) / oneDay) AS days

ORDER BY days, previousEvents DESC

Page 67: Tips and Tricks for Graph Data Modeling

+ previous events attended

Page 68: Tips and Tricks for Graph Data Modeling

RSVPD_YES vs RSVPD

I was curious whether refactoring RSVPD {response: "yes"} to RSVPD_YES would have any impact as Neo4j is optimised for querying by unique relationship types.

Page 69: Tips and Tricks for Graph Data Modeling

Refactor to specific relationships

MATCH (m:Member)-[rsvp:RSVPD {response:"yes"}]->(event)

MERGE (m)-[rsvpYes:RSVPD_YES {id: rsvp.id}]->(event)

ON CREATE SET rsvpYes.created = rsvp.created,

rsvpYes.lastModified = rsvp.lastModified;

MATCH (m:Member)-[rsvp:RSVPD {response:"no"}]->(event)

MERGE (m)-[rsvpYes:RSVPD_NO {id: rsvp.id}]->(event)

ON CREATE SET rsvpYes.created = rsvp.created,

rsvpYes.lastModified = rsvp.lastModified;

Page 70: Tips and Tricks for Graph Data Modeling

RSVPD_YES vs RSVPD

RSVPD {response: "yes"}

vs

RSVPD_YES

Cypher version: CYPHER 2.3, planner: COST. 688635 total db hits in 232 ms.

Cypher version: CYPHER 2.3, planner: COST. 559866 total db hits in 207 ms.

Page 71: Tips and Tricks for Graph Data Modeling

Why would we keep RSVPD?

MATCH (m:Member)-[r:RSVPD]->(event)<-[:HOSTED_EVENT]-(group)

WHERE m.name = "Mark Needham"

RETURN event, group, r

MATCH (m:Member)-[r:RSVPD_YES|:RSVPD_NO|:RSVPD_WAITLIST]->(event),

(event)<-[:HOSTED_EVENT]-(group)

WHERE m.name = "Mark Needham"

RETURN event, group, r

Page 72: Tips and Tricks for Graph Data Modeling

Tip: Specific relationships

‣ Neo4j is optimised for querying by unique relationship types…

‣ ...but sometimes we pay a query maintenance cost to achieve this

Page 73: Tips and Tricks for Graph Data Modeling

+ events my friends are attending

There’s an implicit FRIENDS relationship between people who attended the same events.Let’s make it explicit.

M

E

M

RSVPD

RSVPD

FRIENDS

M

E

M

RSVPD

RSVPD

Page 74: Tips and Tricks for Graph Data Modeling

+ events my friends are attending

MATCH (m1:Member)WHERE NOT m1:Processed

WITH m1 LIMIT {limit}MATCH (m1)-[:RSVP_YES]->(event:Event)<-[:RSVP_YES]-(m2:Member)

WITH m1, m2, COLLECT(event) AS events, COUNT(*) AS timesWHERE times >= 5

WITH m1, m2, times, [event IN events | SIZE((event)<-[:RSVP_YES]-())] AS attendances

WITH m1, m2, REDUCE(score = 0.0, a IN attendances | score + (1.0 / a)) AS score

MERGE (m1)-[friendsRel:FRIENDS]-(m2)SET friendsRel.score = row.score

Page 75: Tips and Tricks for Graph Data Modeling

Bidirectional relationships

‣ You may have noticed that we didn’t specify a direction when creating the relationshipMERGE (m1)-[:FRIENDS]-(m2)

‣ FRIENDS is a bidirectional relationship. We only need to create it once between two people.

‣ We ignore the direction when querying

Page 76: Tips and Tricks for Graph Data Modeling

+ events my friends are attendingWITH 24.0*60*60*1000 AS oneDayMATCH (member:Member {name: "Mark Needham"})MATCH (futureEvent:Event) WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent)MATCH (futureEvent)<-[:HOSTED_EVENT]-(group)

WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMemberOPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group)WHERE pastEvent.time < timestamp()

WITH oneDay, group, futureEvent, member, isMember, COUNT(rsvp) AS previousEventsOPTIONAL MATCH (futureEvent)<-[:HOSTED_EVENT]-()-[:HAS_TOPIC]->(topic)<-[:INTERESTED_IN]-(member)

WITH oneDay, group, futureEvent, member, isMember, previousEvents, COUNT(topic) AS topicsOPTIONAL MATCH (member)-[:FRIENDS]-(:Member)-[rsvpYes:RSVP_YES]->(futureEvent)

RETURN group.name, futureEvent.name, isMember, round((futureEvent.time - timestamp()) / oneDay) AS days, previousEvents, topics, COUNT(rsvpYes) AS friendsGoingORDER BY days, friendsGoing DESC, previousEvents DESCLIMIT 15

Page 77: Tips and Tricks for Graph Data Modeling

+ events my friends are attending

Page 78: Tips and Tricks for Graph Data Modeling

Tip: Bidirectional relationships

‣ Some relationships are bidirectional in nature‣ Neo4j always stores relationships with a

direction but we can choose to ignore that when we query

Page 79: Tips and Tricks for Graph Data Modeling

tl;dr

‣ Model incrementally‣ Always profile your queries‣ Consider making the implicit explicit…

• ...but beware the maintenance cost

‣ Be specific with relationship types‣ Ignore direction for bidirectional relationships

Page 80: Tips and Tricks for Graph Data Modeling

That’s all for today!Questions? :-)

Mark Needham @markhneedhamhttps://github.com/neo4j-meetups/modeling-worked-example