introduction to neo4j and .net
TRANSCRIPT
Agenda
• Neo4j Introduction• Relational Pains – Graph Pleasure• Data Modeling• Query with Cypher• Neo4j and .Net• Drivers & Azure• Demo• Q&A
What is it with Relationships?
• World is full of connected people, events, things• There is “Value in Relationships” !• What about Data Relationships?• How do you store your object model?• How do you explain
JOIN tables to your boss?
Neo4j – allows you to connect the dots
• Was built to efficiently • store, • query and • manage highly connected data
• Transactional, ACID• Real-time OLTP• Open source• Highly scalable on few machines
Value from Data RelationshipsCommon Use Cases
Internal ApplicationsMaster Data Management
Network and IT Operations
Fraud Detection
Customer-Facing ApplicationsReal-Time Recommendations
Graph-Based SearchIdentity and
Access Management
Neo4j Browser – First Class Graph Visualization
• Graph Visualization• Tabular Results• Visual Query Plan• X-Ray Mode• Export to CSV, JSON,
PNG, SVG• Graph Style Sheet• Auto-Retrieve
Connections• Much more …
… to come.
The Whiteboard Model is the Physical Model
Eliminates Graph-to-Relational MappingIn your data model
Bridge the gap between business
and IT modelsIn your application
Greatly reduce need for application code
CAR
DRIVES
name: “Dan”born: May 29, 1970
twitter: “@dan”name: “Ann”
born: Dec 5, 1975
since: Jan 10, 2011
brand: “Volvo”model: “V70”
Property Graph Model Components
Nodes• The objects in the graph• Can have name-value properties• Can be labeled
Relationships• Relate nodes by type and direction• Can have name-value properties
LOVES
LOVES
LIVES WITH
OWN
S
PERSON PERSON
Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
Getting Data into Neo4j
Cypher-Based “LOAD CSV” Capability• Transactional (ACID) writes• Initial and incremental loads of up to
10 million nodes and relationships
Command-Line Bulk Loader neo4j-import• For initial database population• For loads up to 10B+ records• Up to 1M records per second
4.58 million thingsand their relationships…
Loads in 100 seconds!
CSV
Relational DBs Can’t Handle Relationships Well
• Cannot model or store data and relationships without complexity
• Performance degrades with number and levels of relationships, and database size
• Query complexity grows with need for JOINs• Adding new types of data and relationships
requires schema redesign, increasing time to market
… making traditional databases inappropriate when data relationships are valuable in real-time
Slow developmentPoor performance
Low scalabilityHard to maintain
Unlocking Value from Your Data Relationships
• Model your data naturally as a graph of data and relationships
• Drive graph model from domain and use-cases
• Use relationship information in real-time to transform your business
• Add new relationships on the fly to adapt to your changing requirements
High Query Performance with a Native Graph DB
• Relationships are first class citizen• No need for joins, just follow pre-
materialized relationships of nodes• Query & Data-locality – navigate out
from your starting points• Only load what’s needed• Aggregate and project results as you go• Optimized disk and memory model for
graphs
MATCH (boss)-[:MANAGES*0..3]->(mgr) WHERE boss.name = "John Doe" AND (mgr)-[:MANAGES]->()RETURN mgr.name AS Manager, size((mgr)-[:MANAGES*1..3]->()) AS Total
Express Complex Queries Easily with Cypher
Find all reports and how many people they manage, each up to 3 levels down
Cypher Query
SQL Query
High Query Performance: Some Numbers
• Traverse 2-4M+ relationships per second and core
• Cost based query optimizer – complex queries return in milliseconds
• Import 100K-1M records per second transactionally
• Bulk import tens of billions of records in a few hours
Basic Pattern: Tom Hanks‘ Movies?
MATCH (:Person {name:”Tom Hanks"} ) -[:ACTED_IN]-> (:Movie {title:”Forrest Gump"} )
ACTED_IN
Tom Hanks
Forrest Gump
LABEL PROPERTY
NODE NODE
Forrest Gump
LABEL PROPERTY
Basic Query: Tom Hanks‘ Movies?
MATCH (actor:Person)-[:ACTED_IN]->(m:Movie)
WHERE actor.name = "Tom Hanks"
RETURN *
Query Comparison: Colleagues of Tom Hanks?
SELECT *FROM Person as actor JOIN ActorMovie AS am1 ON (actor.id = am1.actor_id) JOIN ActorMovie AS am2 ON (am1.movie_id = am2.movie_id) JOIN Person AS coll ON (coll.id = am2.actor_id)WHERE actor.name = "Tom Hanks“
MATCH (actor:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(coll:Person)WHERE actor.name = "Tom Hanks"RETURN *
Most prolific actors and their filmography?
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, count(*), collect(m.title) as movies
ORDER BY count(*) desc, p.name asc
LIMIT 10;
Neo4j Query Planner
Cost based Query Planner since Neo4j 2.2• Uses database stats to select best plan• Currently for Read Operations• Query Plan Visualizer, finds• Non optimal queries• Cartesian Product• Missing Indexes, Global Scans• Typos• Massive Fan-Out
Query Planner
Slight change, add a Label to query -> more stats available -> new plan with fewer database-hits
Neo4j Remoting Protocols
• Cypher HTTP Endpoint is• Fast• Transactional (multi-request)• Streaming• Batching• Parameters• Statistics, Query Plan, Result Representations
:POST /db/data/transaction/commit {"statements":[{"statement": "MATCH (p:Person) WHERE p.name = {name} RETURN p", "parameters":{"name":"Clint Eastwood"}}]}
• Up next: binary protocol
Neo4j for .Net Developers
Don’t be afraid or disgusted, because “Java”
It’s just a database implemented in some language
You’ll rarely see it.
Neo4j for .Net Developers - Installation
• Neo4j Windows Installer was first
• Chocolatey Packages for Neo4j• Upcoming in Neo4j 2.3 - full PowerShell support• Just install Neo4j as a service
• More to come
Neo4j for .Net Developers - Drivers
• Neo4jClient – one of the first Neo4j Drivers• by Readify Australia• Uses Neo4j’s HTTP APIs• Opinionated• Query DSL
• NetGain – new and thin layer over APIs• New Drivers for binary protocol
Neo4j for .Net Developers – Development & Deployment
• Develop • on Windows with Visual Studio• everywhere with Mono / Xamarin
• Develop locally with local Neo4j instance• Deploy to Azure, use provisioned instances
Neo4j on Azure – Hosting / Provisioning
• Hosted Neo4j Databases by GrapheneDB• Just install on Linux instance• VMDepot Images• Upcoming: Docker
Single Page WebApp on the Movie Dataset
• Bootstrap • Javascript (jQuery)• 3 json http-endpoints• Single: /movie/title/The%20Matrix• Search: /search?query=Matrix• Graph: /graph?limit=100
• Send XHR, Render results
Data Model
public class Person { public string name { get; set; } public int born { get; set; } }
public class Movie { public string title { get; set; } public int released { get; set; } public string tagline { get; set; } }
ACTED_IN|DIRECTED|…
name,born
Forrest Gump
titlereleasetagline
Setup
• Add Neo4jClient as dependency• Store GraphDB-URL in WebConfig • Connect in WebApiConfig
var url = AppSettings["GraphDBUrl"];var client = new GraphClient(new Uri(url));client.Connect();
Routes & Controllers
• Provide Routes for• index.html and • 3 endpoints
• 4 Controllers: • query with parameter, • return results as JSON
[RoutePrefix("search")]public class SearchController : ApiController { [HttpGet] [Route("")] public IHttpActionResult SearchMoviesByTitle(string q) { var data = WebApiConfig.GraphClient.Cypher .Match("(m:Movie)") .Where("m.title =~ {title}") .WithParam("title", "(?i).*" + q + ".*") .Return<Movie>("m") .Results.ToList();
return Ok(data.Select(c => new { movie = c})); }}
Neo4j Clustering Architecture Optimized for Speed & Availability at Scale
45
Performance Benefits• No network hops within queries• Real-time operations with fast and
consistent response times • Cache sharding spreads cache across
cluster for very large graphs
Clustering Features• Master-slave replication with
master re-election and failover • Each instance has its own local cache• Horizontal scaling & disaster recovery
Load Balancer
Neo4jNeo4jNeo4j
MIGRATE ALL DATA
MIGRATE GRAPH DATA
DUPLICATE GRAPH DATA
Non-graph data Graph data
Graph dataAll data
All data
RelationalDatabase
GraphDatabase
Application
Application
Application
Three Ways to Migrate Data to Neo4j
Data Storage andBusiness Rules Execution
Data Mining and Aggregation
Neo4j Fits into Your Enterprise Environment
Application
Graph Database Cluster
Neo4j Neo4j Neo4j
Ad HocAnalysis
Bulk AnalyticInfrastructure
Graph Compute EngineEDW …
Data Scientist
End User
DatabasesRelational
NoSQLHadoop
Resources
Online• Developer Site
neo4j.com/developer• DotNet Page• Guide: Cypher• Guide: CSV Import
• Courses• Pluralsight• Wintellect Now
• Reference Manual• StackOverflow
Offline• In Browser Guides• Training Classes (Intro, Modeling)• Office Hours• Professional Services Workshop• Free e-Books: • Graph Databases 2nd Ed (O‘Reilly)• Learning Neo4j
SummaryIntroduction Neo4j & .NetNeo4j Allows You…• Keep your rich data model• Handle relationships efficiently• Write queries easily• Develop applications quickly
For .Net Developers• Neo4j Installer• Drivers for Neo4j from .Net• Host Database on Azure• Deploy Apps to Azure