graph databases for sql server professionals

34
Graph Databases for SQL Server Professionals Stéphane Fréchette Thursday September 18, 2014

Upload: stephane-frechette

Post on 03-Dec-2014

1.295 views

Category:

Technology


6 download

DESCRIPTION

Graph databases are used to represent graph structures with nodes, edges and properties. Neo4j, an open-source graph database is reliable and fast for managing and querying highly connected data. Will explore how to install and configure, create nodes and relationships, query with the Cypher Query Language, importing data and using Neo4j in concert with SQL Server... Providing answers and insight with visual diagrams about connected data that you have in your SQL Server Databases!

TRANSCRIPT

Page 1: Graph Databases for SQL Server Professionals

Graph Databases for SQL Server Professionals

Stéphane FréchetteThursday September 18, 2014

Page 2: Graph Databases for SQL Server Professionals

Who am I?

My name is Stéphane Fréchette

SQL Server MVP | Consultant | Speaker | Database & BI Architect | NoSQL. Drums, good food and fine wine. Founder @ukubu, @GatineauOuverte, @TEDxGatineau

I have a passion for architecting, designing and building solutions that matter.

Twitter: @sfrechetteBlog: stephanefrechette.comEmail: [email protected]

Page 3: Graph Databases for SQL Server Professionals

Session Outline

• What is a Graph?• What is Neo4j?• Data Modeling – The Property Graph• Cypher Query Language• Importing Data…• Use Cases• Demos• Resources

Page 4: Graph Databases for SQL Server Professionals

What is a Graph?

Page 5: Graph Databases for SQL Server Professionals

Are these Graphs?

Page 6: Graph Databases for SQL Server Professionals

This is a Graph

Node

Relationship

A Property Graph

Page 7: Graph Databases for SQL Server Professionals

Organization Project Graph

Page 8: Graph Databases for SQL Server Professionals

Twitter Social Graph

Page 9: Graph Databases for SQL Server Professionals

What is Neo4j?

An open-source graph database by Neo Technology. Neo4j stores data in nodes connected by directed, typed relationships with properties on both, also know as a Property Graph

• Fully ACID compliant• Massively scalable, up to several billion

nodes/relationships/properties• Highly-available, when distributed across multiple

machines• Accessible by a convenient REST interface or an

object-oriented Java API

Page 10: Graph Databases for SQL Server Professionals

Data Modeling

From SQL Server to Graph

Property Graph

Page 11: Graph Databases for SQL Server Professionals

Example: Meetup Data In SQL Server

ID Member

1 Daniel

2 Stephane

3 John

4 Randy

ID Name

1 Ottawa SQL Server User Group

2 Ottawa JavaScript

3 Ottawa Visio User Group

4 Ottawa Tableau User Group

5 Dirty Dancing Ottawa

MemberID MeetupID

2 1

1 2

3 3

2 4

3 5

MemberID MeetupID

3 1

3 2

4 2

4 4

1 5

Member MeetupMeetupOrganizer MeetupMember

Page 12: Graph Databases for SQL Server Professionals

Example: Meetup Data In a Graph Member Meetup

name: ‘Stephane’

name: ‘Ottawa Tableau User Group’

name: ‘Ottawa SQL Server User Group’

name: ‘John’

name: ‘Ottawa JavaScript’

name: ‘Dirty Dancing Ottawa’

name: ‘Ottawa Visio User Group’

name: ‘Randy’

name: ‘Daniel’

IS_ORGANIZER

IS_ORG

ANIZER

IS_ORGANIZER

IS_ORGANIZER

IS_ORGANIZER

IS_MEMBER

IS_MEMBER

IS_M

EMBE

R

IS_MEM

BER

IS_MEMBER

Page 13: Graph Databases for SQL Server Professionals

Cypher Query Language

Cypher is a declarative graph query language that allows for expressive and efficient querying and updating of the graph store

• Pattern-matching• Declarative: what to retrieve, not how to retrieve it• Inspired from other known Language (SQL, SPARQL, Haskell, Python)• Aggregation, Ordering, Limit• Update the Graph

Page 14: Graph Databases for SQL Server Professionals

Cypher and T-SQL

Cypher also has a number of keywords that have a direct equivalence with SQL which makes it a curiously familiar language

• WHERE• ORDER BY• LIMIT• SUM, COUNT, STDEVP, MIN, MAX etc…• LTRIM, UPPER, LOWER, REPLACE, LEFT, RIGHT, SUBSTRING• DISTINCT• CASE

(SQL Server Pros) – [:WILL_LOVE] -> (Cypher)

Page 15: Graph Databases for SQL Server Professionals

Cypher - Meetup

Page 16: Graph Databases for SQL Server Professionals

Neo4j Browser

Page 17: Graph Databases for SQL Server Professionals

Demo(let’s query some data…)

Page 18: Graph Databases for SQL Server Professionals

Importing Data…

Page 19: Graph Databases for SQL Server Professionals

Importing Data…

Some important considerations…Different import scenarios

• Dataset size: 1000s, 100000s, 10000000s• Dataset format (source): Database, File (CSV, Spreadsheet, GraphML, Geoff), Service, Other• Import type: Initial Bulk Load, Incremental Load, Initial Bulk Load + Incremental Load

Different import tools

• Spreadsheet based• Neo4j-shell based: (Cypher, neo4j-shell-tools, Cypher LOAD CSV)• Command-line based: Batch Importer• Neo4j Brower based• ETL Tools: (Talend, Mulesoft, Pentaho Kettle)• Custom software: (Java API, REST API, Spring Data Neo4j)

Page 20: Graph Databases for SQL Server Professionals

Many different mappings

Not always clear what you should be using Depends on your skillsets, dataset size… (lots of other stuff)

Choose wisely!

Import Scenarios Import Tools

Page 21: Graph Databases for SQL Server Professionals

Demo(walkthrough on importing data…)

Page 22: Graph Databases for SQL Server Professionals

The Sample Dataset

Page 23: Graph Databases for SQL Server Professionals

Importing using Spreadsheets

Very small size datasets < 1000, easy to use

Format data in spreadsheet

Generate Cypher statements with

formulas

Copy and Execute Cypher in Neo4j

browser

Page 24: Graph Databases for SQL Server Professionals

Importing using Spreadsheets

Page 25: Graph Databases for SQL Server Professionals

Importing using neo4j-shell-tools

Small to medium size datasetshttps://github.com/jexp/neo4j-shell-tools

Format data in CSV files

Create import-cypher commands for

neo4j-shell-tools

Execute commands from neo4j-shell

Page 26: Graph Databases for SQL Server Professionals

Importing using neo4j-shell-tools

Page 27: Graph Databases for SQL Server Professionals

Importing using LOAD CSV

Native Cypher

Format data in CSV files

Create “LOAD CSV” commands

Execute command from neo4j-shell or

browser

Additional “cleanup” for

Labels and RelTypes

Page 28: Graph Databases for SQL Server Professionals

Importing using LOAD CSV

Page 29: Graph Databases for SQL Server Professionals

Importing using Batch Importer

Non-transactional import, suited for very very large datasets

Format data in TSV files

Execute Batch Import command

Copy store files to Neo4j Server

directory

Start Neo4j Server with generated

store files

Page 30: Graph Databases for SQL Server Professionals

Use Cases

Principal uses of Graph Database include

• Network and Data Center Management(Queries: Impact Analysis, Root Cause Analysis, Quality-of-Service Mapping, Asset Management)

• Authorization and Access(Queries : Access Management, Interconnected Group Organization, Provenance)

• Social(Queries : Friend Recommendations, Sharing & Collaboration, Influencer Analysis)

• Geo(Queries : Routing, Logistics, Capacity Planning)

• Recommendations(Queries : Product, Social, Service, and Professional Recommendations)

• Fraud Detection

http://www.neotechnology.com/neo4j-use-cases/

Page 31: Graph Databases for SQL Server Professionals

Summary

(graphs)-[:ARE]->(everywhere)

Page 32: Graph Databases for SQL Server Professionals

Resources• Neo Technology http://www.neotechnology.com/

• Neoj.org (Learn, Develop, Downloads,…) http://www.neo4j.org/

• Neo4j on Vimeo http://vimeo.com/neo4j

• Neo4j on SlideShare http://www.slideshare.net/neo4j

• Neo4j on Github https://github.com/neo4j

• Neo4j Cypher Cheat Sheet http://docs.neo4j.org/refcard/2.1/

• Neo4j Graph Database as a Service http://www.graphenedb.com/

• Linkurious – The easiest way to explore graph databases http://linkurio.us/

• KeyLines- Visualize dynamic networks http://keylines.com/

• Experiments with NEO4J: Using a graph database as a SQL Server metadata hub http://bit.ly/V2PrxN

• Kenny Bastani http://www.kennybastani.com/

• Rik Van Bruggen http://blog.bruggen.com/

• Max de Marzi http://maxdemarzi.com/

• Better Software Development http://jexp.de/blog/

• Graph Databases (Free Book) http://graphdatabases.com/

• Neo4j GraphGist http://gist.neo4j.org/

• GraphConnect Conference http://graphconnect.com/

• Titan – Distributed Graph Database https://thinkaurelius.github.io/titan/

• InfiniteGraph http://www.infinitegraph.com/

• OrientDB http://www.orientechnologies.com/

• Cayley by Google https://github.com/google/cayley

Page 33: Graph Databases for SQL Server Professionals

What Questions Do You Have?

Page 34: Graph Databases for SQL Server Professionals

Thank YouFor attending this session