a little graph theory for the busy developer - jim webber @ graphconnect chicago 2013

67
A Little Graph Theory for the Busy Developer Jim Webber Chief Scientist, Neo Technology @jimwebber

Upload: neo4j-the-open-source-graph-database

Post on 15-Jan-2015

1.637 views

Category:

Technology


0 download

DESCRIPTION

In this talk we'll explore powerful analytic techniques for graph data. Firstly we'll discover some of the innate properties of (social) graphs from fields like anthropology and sociology. By understanding the forces and tensions within the graph structure and applying some graph theory, we'll be able to predict how the graph will evolve over time. To test just how powerful and accurate graph theory is, we'll also be able to (retrospectively) predict World War 1 based on a social graph and a few simple mechanical rules. Then we'll see how graph matching can be used to extract online business intelligence (for powerful retail recommendations). In turn we'll apply these powerful techniques to modelling domains in Neo4j (a graph database) and show how Neo4j can be used to drive business intelligence. Don't worry, there won't be much maths :-)

TRANSCRIPT

Page 1: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

A Little Graph Theory for the Busy Developer

Jim WebberChief Scientist, Neo Technology

@jimwebber

Page 2: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Roadmap

• Imprisoned data• Graph models• Graph theory– Local properties, global behaviors– Predictive techniques

• Graph matching– Real-time analytics for fun and profit

• Fin

Page 3: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 4: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

http://www.flickr.com/photos/crazyneighborlady/355232758/

Page 5: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

http://gallery.nen.gov.uk/image82582-.html

Page 6: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 7: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Aggregate-Oriented Datahttp://martinfowler.com/bliki/AggregateOrientedDatabase.html

“There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences.

This is why aggregate-oriented stores talk so much about map-reduce.”

Page 8: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 9: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

complexity = f(size, connectedness, uniformity)

Page 10: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 11: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

http://www.bbc.co.uk/london/travel/downloads/tube_map.html

Page 12: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Property graphs

• Property graph model:– Nodes with properties– Named, directed relationships with properties– Relationships have exactly one start and end node• Which may be the same node

Page 13: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

stolefrom

lovesloves

enemy

enemyA Good Man Goes to War

appeared in

appeared in

appeared in

appeared in

Victory of the Daleks

appeared in

appeared in

companion

companion

enemy

Page 14: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Property graphs are very whiteboard-friendly

Page 15: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

http://blogs.adobe.com/digitalmarketing/analytics/predictive-analytics/predictive-analytics-and-the-digital-marketer/

Page 16: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

16http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg

Meet Leonhard Euler• Swiss mathematician• Inventor of Graph

Theory (1736)

Page 17: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Königsberg (Prussia) - 1736

Page 18: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

A

B

D

C

Page 19: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

A

B

D

C

1

2

3

4

7

6

5

Page 20: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

20http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

Page 21: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 22: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Triadic Closurename: Kyle

name: Stan name: Kenny

FRIENDFRIEND

Page 23: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Triadic Closurename: Kyle

name: Stan name: Kenny

FRIENDFRIEND

name: Kyle

name: Stan name: Kenny

FRIENDFRIEND

FRIEND

Page 24: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Structural Balancename: Cartman

name: Craig name: Tweek

ENEMYFRIEND

Page 25: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Structural Balancename: Cartman

name: Craig name: Tweek

ENEMYFRIEND

name: Cartman

name: Craig name: Tweek

ENEMYFRIEND

FRIEND

Page 26: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Structural Balancename: Cartman

name: Craig name: Tweek

ENEMYFRIEND

name: Cartman

name: Craig name: Tweek

ENEMYFRIEND

ENEMY

Page 27: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Structural Balancename: Kyle

name: Stan name: Kenny

FRIENDFRIEND

name: Kyle

name: Stan name: Kenny

FRIENDFRIEND

FRIEND

Page 28: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Structural Balance is a key predictive technique

And it’s domain-agnostic

Page 29: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Allies and Enemies

UK

GermanyFrance

Russia Italy

Austria

Page 30: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Allies and Enemies

UK

GermanyFrance

Russia Italy

Austria

Page 31: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Allies and Enemies

UK

GermanyFrance

Russia Italy

Austria

Page 32: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Allies and Enemies

UK

GermanyFrance

Russia Italy

Austria

Page 33: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Allies and Enemies

UK

GermanyFrance

Russia Italy

Austria

Page 34: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Allies and Enemies

UK

GermanyFrance

Russia Italy

Austria

Page 35: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Predicting WWI [Easley and Kleinberg]

Page 36: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Strong Triadic Closure

It if a node has strong relationships to two neighbours, then these neighbours must have at least a weak relationship between them.

[Wikipedia]

Page 37: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Triadic Closure(weak relationship)

name: Kenny

name: Stan name: Cartman

FRIENDFRIEND

Page 38: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Triadic Closure(weak relationship)

name: Kenny

name: Stan name: Cartman

FRIENDFRIEND

name: Kenny

name: Stan name: Cartman

FRIENDFRIEND

FRIEND 50%

Page 39: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Weak relationships

• Relationships can have “strength” as well as intent– Think: weighting on a relationship in a property

graph• Weak links play another super-important

structural role in graph theory– They bridge neighbourhoods

Page 40: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Local Bridges

FRIEND

name: Kenny

name: Stanname: KyleFRIENDFRIE

ND

FRIEND

name: Sally

name: Bebename: WendyFRIENDFRIE

ND

FRIEND 50%

name: CartmanFRIEND

FRIEND 50%ENEMY

Page 41: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Local Bridge Property

“If a node A in a network satisfies the Strong Triadic Closure Property and is involved in at least two strong relationships, then any local bridge it is involved in must be a weak relationship.”

[Easley and Kleinberg]

Page 42: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

University Karate Club

Page 43: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Graph Partitioning

• (NP) Hard problem– Recursively remove the spanning links between

dense regions– Or recursively merge nodes into ever larger

“subgraph” nodes– Choose your algorithm carefully – some are better

than others for a given domain• Can use to (almost exactly) predict the

break up of the karate club!

Page 44: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

University Karate Clubs(predicted by Graph Theory)

9

Page 45: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

University Karate Clubs(what actually happened!)

Page 46: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 47: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Cypher

• Declarative graph pattern matching language– “SQL for graphs”– Columnar results

• Supports graph matching commands and queries– Find me stuff like this…– Aggregation, ordering and limit, etc.

Page 48: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 49: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Firstname: MickeySurname: SmithDoB: 19781006

SKU: 5e175641Product: Badgers Nadgers Ale

SKU: 2555f258Product: Peewee Pilsner

Category: beer

SKU: 49d102bcProduct: Baby Dry Nights

Category: nappies

Category: baby Category: alcoholic drinks

SKU: 49d102bcProduct: XBox 360

Category: consumer electronics

Category: console

BOUGHT

BOUGHTBOUGHT

BOUGHT

MEMBER_OF

MEMBER_OF

MEMBER_OFMEMBER_OF

MEMBER_OFMEMBER_OF

MEMBER_OF

Page 50: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 51: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Firstname: *Surname: *DoB: 1996 > x > 1972

Category: beerCategory: nappies

BOUGHT

BOUGHT

BOUGHTCategory: game console

Page 52: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Firstname: *Surname: *DoB: 1996 > x > 1972

Category: beerCategory: nappies

BOUGHT

BOUGHT

!BOUGHTCategory: game console

Page 53: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

(beer)(nappies)

(console)

(daddy)

<-[:MEMBER_OF]-

()<-[:BOUGHT]-

-[:MEMBER_OF]->

()

-[:BOUGHT]->

<-[:MEMBER_OF]-

<-[b:BOUGHT]-()

Page 54: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Flatten the graph(daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies)(daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer)(daddy)-[b:BOUGHT]->()-[:MEMBER_OF]->(console)

Page 55: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Wrap in a Cypher MATCH clauseMATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies),(daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer),(daddy)-[b:BOUGHT]->()-[:MEMBER_OF]->(console)

Page 56: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Cypher WHERE clauseMATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies),(daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer),(daddy)-[b:BOUGHT]->()-[:MEMBER_OF]->(console)WHERE b is null

Page 57: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Full Cypher querySTART beer=node:categories(category=‘beer’), nappies=node:categories(category=‘nappies’), xbox=node:products(product=‘xbox 360’)

MATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer), (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies), (daddy)-[b?:BOUGHT]->(xbox)

WHERE b is null

RETURN distinct daddy

Page 58: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Results==> +---------------------------------------------+==> | daddy |==> +---------------------------------------------+==> | Node[15]{name:"Rory Williams",dob:19880121} |==> +---------------------------------------------+==> 1 row==> 0 ms==> neo4j-sh (0)$

Page 59: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 60: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Facebook Graph Search

Which sushi restaurants in NYC do my friends like?

Page 61: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Graph Structure

Page 62: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Cypher query

START me=node:person(name = 'Jim'),   location=node:location(location='New York'),  cuisine=node:cuisine(cuisine='Sushi')

MATCH (me)-[:IS_FRIEND_OF]->(friend)-[:LIKES]->(restaurant) -[:LOCATED_IN]->(location),(restaurant)-[:SERVES]->(cuisine)

RETURN restaurant

Page 63: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Search structure

Page 64: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013
Page 65: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

What are graphs good for?• Recommendations• Pharmacology• Business intelligence• Social computing• Geospatial• MDM• Data center management• Web of things• Genealogy• Time series data• Product catalogue• Web analytics• Scientific computing • Indexing your slow RDBMS• And much more!

Page 66: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Free O’Reilly book for everyone!

http://graphdatabases.com

Page 67: A Little Graph Theory for the Busy Developer - Jim Webber @ GraphConnect Chicago 2013

Thanks for listeningNeo4j: http://neo4j.orgMe: @jimwebber