the opencypher project - an open graph query language
TRANSCRIPT
The openCypher projectMichael Hunger
Topics
• Property Graph Model• Cypher - A language for querying graphs• Cypher History• Cypher Demo• Current implementation in Neo4j• User Feedback• Opening up - The openCypher project• Governance, Contribution Process• Planned Deliverables
The Property-Graph-ModelYou know it, right?
CAR
DRIVES
name: “Dan”born: May 29, 1970
twitter: “@dan”name: “Ann”
born: Dec 5, 1975
since: Jan 10, 2011
brand: “Volvo”model: “V70”
Labeled Property Graph Model Components
Nodes• The objects in the graph• Can have name-value properties• Can be labeled
Relationships• Relate nodes by type and direction• Can have name-value properties
LOVES
LOVES
LIVES WITH
OWN
S
PERSON PERSON
Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person PersonPerson-Friend
ANDREASDELIA
TOBIAS
MICA
Cypher Query LanguageWhy, How, When?
Why Yet Another Query Language (YAQL)?
• SQL and SparQL hurt our brains
• Our brains crave patterns
• It‘s all about patterns
• Creating a query language is fun (and hard work)
What is Cypher?
• A graph query language that allows for expressive and efficient
querying of graph data
• Intuitive, powerful and easy to learn
• Write graph queries by describing patterns in your data
• Focus on your domain not the mechanics of data access.
• Designed to be a human-readable query language
• Suitable for developers and operations professionals
What is Cypher?
• Cypher is declarative, which means it lets users express what
data to retrieve
• The guiding principle behind Cypher is to make simple things
easy and complex things possible
• A humane query language
• Stolen from SQL (common keywords), SPARQL (pattern
matching), Python and Haskell (collection semantics)
Why Cypher?
Compared to:• SPARQL (Cypher came from real-world use, not academia)• Gremlin (declarative vs imperative)• SQL (graph-specific vs set-specific)
(Cypher)-[:LOVES]->(ASCII Art)A language should be readable, not just writable. You will read your code
dozens more times than you write it. Regex for example are write-only.
Querying the GraphSome Examples With Cypher
Basic Query: Who do people report to?
MATCH (:Employee {firstName:”Steven”} ) -[:REPORTS_TO]-> (:Employee {firstName:“Andrew”} )
REPORTS_TO
Steven Andrew
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
Basic Query Comparison: Who do people report to?
SELECT *FROM Employee as e JOIN Employee_Report AS er ON (e.id = er.manager_id) JOIN Employee AS sub ON (er.sub_id = sub.id)
MATCH (e:Employee)-[:REPORTS_TO]->(mgr:Employee)RETURN *
Basic Query: Who do people report to?
Basic Query: Who do people report to?
Cypher SyntaxOnly Tip of the Iceberg
Syntax: Patterns
( )-->( )
(node:Label {key:value})
(node1)-[rel:REL_TYPE {key:value}]->(node2)
(node1)-[:REL_TYPE1]->(node2)<-[:REL_TYPE2]-(node3)
(node1)-[:REL_TYPE*m..n]->(node2)
Patterns are used in
• (OPTIONAL) MATCH
• CREATE, MERGE
• shortestPath()
• Predicates
• Expressions
• (Comprehensions)
Syntax: Structure
(OPTIONAL) MATCH <patterns>
WHERE <predicates>
RETURN <expression> AS <name>
ORDER BY <expression>
SKIP <offset> LIMIT <size>
Syntax: Automatic Aggregation
MATCH <patterns>
RETURN <expr>, collect([distinct] <expression>) AS <name>,
count(*) AS freq
ORDER BY freq DESC
DataFlow: WITH
WITH <expression> AS <name>, ....
• controls data flow between query segments• separates reads from writes• can also• aggregate• sort• paginate
• replacement for HAVING• as many WITHs as you like
Structure: Writes
CREATE <pattern>
MERGE <pattern> ON CREATE ... ON MATCH ...
(DETACH) DELETE <entity>
SET <property,label>
REMOVE <property,label>
Data Import
[USING PERODIC COMMIT <count>]
LOAD CSV [WITH HEADERS] FROM „URL“ AS row
... any Cypher clauses, mostly match + updates ...
Collections
UNWIND (range(1,10) + [11,12,13]) AS x
WITH collect(x) AS coll
WHERE any(x IN coll WHERE x % 2 = 0)
RETURN size(coll), coll[0], coll[1..-1] ,
reduce(a = 0, x IN coll | a + x),
extract(x IN coll | x*x), filter(x IN coll WHERE x > 10),
[x IN coll WHERE x > 10 | x*x ]
Maps & Entities
WITH {age:42, name: „John“, male:true} as data
WHERE exists(data.name) AND data[„age“] = 42
CREATE (n:Person) SET n += data
RETURN [k in keys(n) WHERE k CONTAINS „a“
| {key: k, value: n[k] } ]
Optional Schema
CREATE INDEX ON :Label(property)
CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE
CREATE CONSTRAINT ON (n:Label) ASSERT exists(n.property)
CREATE CONSTRAINT ON (:Label)-[r:REL]->(:Label2)
ASSERT exists(r.property)
And much more ...
neo4j.com/docs/stable/cypher-refcard
More Examples
MATCH (sub)-[:REPORTS_TO*0..3]->(boss), (report)-[:REPORTS_TO*1..3]->(sub)WHERE boss.firstName = 'Andrew'RETURN sub.firstName AS Subordinate,
count(report) AS Total;
Express Complex Queries Easily with Cypher
Find all direct reports and how many people they manage, each up to 3 levels down
Cypher Query
SQL Query
Who is in Robert’s (direct, upwards) reporting chain?
MATCH path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)WHERE sub.firstName = 'Robert'RETURN path;
Who is in Robert’s (direct, upwards) reporting chain?
Product Cross-SellMATCH (choc:Product {productName: 'Chocolade'}) <-[:ORDERS]-(:Order)<-[:SOLD]-(employee), (employee)-[:SOLD]->(o2)-[:ORDERS]->(other:Product)RETURN employee.firstName, other.productName, count(distinct o2) as countORDER BY count DESCLIMIT 5;
Product Cross-Sell
Neo4j‘s Cypher Implementation
History of Cypher
• 1.4 - Cypher initially added to Neo4j• 1.6 - Cypher becomes part of REST API• 1.7 - Collection functions, global search, pattern predicates• 1.8 - Write operations• 1.9 Type System, Traversal Matcher, Caches, String functions, more
powerful WITH, Lazyness, Profiling, Execution Plan• 2.0 Label support, label based indexes and constraints, MERGE,
transactional HTTP endpoint, literal maps, slices, new parser, OPTIONAL MATCH
• 2.1 – LOAD CSV, COST Planner, reduce eagerness, UNWIND, versioning• 2.2 – COST Planner default, EXPLAIN, PROFILE, vis. Query Plan, IDP• 2.3 -
Try it out!
APIs• Embedded• graphDb.execute(query, params);
• HTTP – transactional Cypher endpoint• :POST /db/data/transaction[/commit] {statements:[{statement: „query“,
parameters: params, resultDataContents:[„row“], includeStats:true},....]}
• Bolt – binary protocol• Driver driver = GraphDatabase.driver( "bolt://localhost" );
Session session = driver.session();
Result rs = session.run("CREATE (n) RETURN n");
Cypher Today - Neo4j Implementation
• Convert the input query into an abstract syntax tree (AST)• Optimise and normalise the AST (alias expansion, constant folding etc)• Create a query graph - a high-level, abstract representation of the query -
from the normalised AST• Create a logical plan, consisting of logical operators, from the query graph,
using the statistics store to calculate the cost. The cheapest logical plan is selected using IDP (iterative dynamic programming)
• Create an execution plan from the logical plan by choosing a physical implementation for logical operators
• Execute the queryhttp://neo4j.com/blog/introducing-new-cypher-query-optimizer/
Cypher Today - Neo4j Implementation
Neo4j Query Planner
Cost based Query Planner since Neo4j 2.2• Uses database stats to select best plan• Currently for Read Operations• Query Plan Visualizer, finds• Non optimal queries• Cartesian Product• Missing Indexes, Global Scans• Typos• Massive Fan-Out
openCypherAn open graph query language
Why ?
We love Cypher!
Our users love Cypher.
We want to make everyone happy through using it.
And have Cypher run on their data(base).
We want to collaborate with community and industry partners to
create the best graph query language possible!
We love the love
Future of (open)Cypher
• Decouple the language from Neo4j
• Open up and make the language design process transparent
• Encourage use within of databases/tools/highlighters/etc
• Delivery of language docs, tools and implementation
• Governed by the Cypher Language Group (CLG)
CIP (Cypher Improvement Proposal)• A CIP is a semi-formal specification
providing a rationale for new language features and constructs
• Contributions are welcome: submit either a CIP (as a pull request) or a feature request (as an issue) at the openCypher GitHub repository
• See „Ressources“ for• accepted CIPs• Contribution Process• Template
github.com/opencypher/openCypher
CIP structure• Sections include:• motivation, • background, • proposal (including the
syntax and semantics), • alternatives, • interactions with existing
features, • benefits,• drawbacks
• Example of the “STARTS WITH / ENDS WITH / CONTAINS” CIP
Deliverables
✔ Improvement Process ✔ Governing Body ✔ Language grammar (Jan-2016)
Technology certification kit (TCK) Cypher Reference Documentation Cypher language specification Reference implementation (under Apache 2.0) Cypher style guide Opening up the CLG
Cypher language specification
• EBNF Grammar
• Railroad diagrams
• Semantic specification
• Licensed under a Creative Commons license
Language Grammar (RELEASED Jan-30-2016)
…Match = ['OPTIONAL', SP], 'MATCH', SP, Pattern, {Hint}, [Where] ;
Unwind = 'UNWIND', SP, Expression, SP, 'AS', SP, Variable ;
Merge = 'MERGE', SP, PatternPart, {SP, MergeAction} ;
MergeAction = ('ON', SP, 'MATCH', SP, SetClause) | ('ON', SP, 'CREATE', SP, SetClause);...
github.com/opencypher/openCypher/blob/master/grammar.ebnf
Technology Compliance Kit (TCK)
● Validates a Cypher implementation
● Certifies that it complies with a given version of Cypher
● Based on given dataset
● Executes a set of queries and
● Verifies expected outputs
Cypher Reference Documentation
• Style Guide
• User documentation describing the use of Cypher
• Example datasets with queries
• Tutorials
• GraphGists
Style Guide
• Label are CamelCase
• Properties and functions are lowerCamelCase
• Keywords and Relationship-Types are ALL_CAPS
• Patterns should be complete and left to right
• Put anchored nodes first
• .... to be released ...
Reference implementation (ASL 2.0)
• A fully functional implementation of key parts of the stack needed to support Cypher inside a platform or tool
• First deliverable: parser taking a Cypher statement and parsing it into an AST (abstract syntax tree)
• Future deliverables:• Rule-based query planner• Query runtime
• Distributed under the Apache 2.0 license• Can be used as example or as a implementation foundation
The Cypher Language Group (CLG)
• The steering committee for language evolution
• Reviews feature requests and proposals (CIP)
• Caretakers of the language
• Focus on guiding principles
• Long term focus, no quick fixes & hacks
• Currently group of Cypher authors, developers and users
• Publish Meeting Minutes -> opencypher.github.io/meeting-minutes/
“Graph processing is becoming an indispensable part of the modern big data stack. Neo4j’s Cypher query language has greatly accelerated graph database adoption.
We are looking forward to bringing Cypher’s graph pattern matching capabilities into the Spark stack, making it easier for masses to access query graph processing.”
- Ion Stoica, CEO & Founder Databricks
“Lots of software systems could be improved by using a graph datastore. One thing holding back the category has been the lack of a widely supported, standard graph query language. We see the appearance of openCypher as an important step towards the broader use of graphs across the industry.”
- Rebecca Parsons, ThoughtWorks, CTO
Some people like it
And support openCypher
Ressources
• http://www.opencypher.org/
• https://github.com/opencypher/openCypher• https://github.com/opencypher/openCypher/blob/master/
CONTRIBUTING.adoc
• https://github.com/opencypher/openCypher/tree/master/cip
• https://github.com/opencypher/openCypher/pulls
• http://groups.google.com/group/openCypher
• @openCypher
Please contributeFeedback, Ideas, ProposalsImplementations
Thank You !Questions ?