the opencypher project - an open graph query language

58
The openCypher project Michael Hunger

Upload: neo4j-the-fastest-and-most-scalable-native-graph-database

Post on 15-Apr-2017

1.008 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: The openCypher Project - An Open Graph Query Language

The openCypher projectMichael Hunger

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking [email protected] or [email protected] or Nicole as well.
Page 2: The openCypher Project - An Open Graph Query Language

Topics

• Property Graph Model• Cypher - A language for querying graphs• Cypher History• Cypher Demo• Current implementation in Neo4j• User Feedback• Opening up - The openCypher project• Governance, Contribution Process• Planned Deliverables

Page 3: The openCypher Project - An Open Graph Query Language

The Property-Graph-ModelYou know it, right?

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking [email protected] or [email protected] or Nicole as well.
Page 4: The openCypher Project - An Open Graph Query Language

CAR

DRIVES

name: “Dan”born: May 29, 1970

twitter: “@dan”name: “Ann”

born: Dec 5, 1975

since: Jan 10, 2011

brand: “Volvo”model: “V70”

Labeled Property Graph Model Components

Nodes• The objects in the graph• Can have name-value properties• Can be labeled

Relationships• Relate nodes by type and direction• Can have name-value properties

LOVES

LOVES

LIVES WITH

OWN

S

PERSON PERSON

Page 5: The openCypher Project - An Open Graph Query Language

Relational Versus Graph Models

Relational Model Graph Model

KNOWS

KNOWS

KNOWS

ANDREAS

TOBIAS

MICA

DELIA

Person PersonPerson-Friend

ANDREASDELIA

TOBIAS

MICA

Page 6: The openCypher Project - An Open Graph Query Language

Cypher Query LanguageWhy, How, When?

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking [email protected] or [email protected] or Nicole as well.
Page 7: The openCypher Project - An Open Graph Query Language

Why Yet Another Query Language (YAQL)?

• SQL and SparQL hurt our brains

• Our brains crave patterns

• It‘s all about patterns

• Creating a query language is fun (and hard work)

Michael Hunger
probably add a slide on the property graph model for which Cypher is made?
Petra Selmer
As discussed on Weds, talking about the property graph model may be too basic for the audience? Up to you...
Page 8: The openCypher Project - An Open Graph Query Language

What is Cypher?

• A graph query language that allows for expressive and efficient

querying of graph data

• Intuitive, powerful and easy to learn

• Write graph queries by describing patterns in your data

• Focus on your domain not the mechanics of data access.

• Designed to be a human-readable query language

• Suitable for developers and operations professionals

Michael Hunger
probably add a slide on the property graph model for which Cypher is made?
Petra Selmer
As discussed on Weds, talking about the property graph model may be too basic for the audience? Up to you...
Page 9: The openCypher Project - An Open Graph Query Language

What is Cypher?

• Cypher is declarative, which means it lets users express what

data to retrieve

• The guiding principle behind Cypher is to make simple things

easy and complex things possible

• A humane query language

• Stolen from SQL (common keywords), SPARQL (pattern

matching), Python and Haskell (collection semantics)

Page 10: The openCypher Project - An Open Graph Query Language

Why Cypher?

Compared to:• SPARQL (Cypher came from real-world use, not academia)• Gremlin (declarative vs imperative)• SQL (graph-specific vs set-specific)

(Cypher)-[:LOVES]->(ASCII Art)A language should be readable, not just writable. You will read your code

dozens more times than you write it. Regex for example are write-only.

Page 11: The openCypher Project - An Open Graph Query Language

Querying the GraphSome Examples With Cypher

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking [email protected] or [email protected] or Nicole as well.
Page 12: The openCypher Project - An Open Graph Query Language

Basic Query: Who do people report to?

MATCH (:Employee {firstName:”Steven”} ) -[:REPORTS_TO]-> (:Employee {firstName:“Andrew”} )

REPORTS_TO

Steven Andrew

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

Page 13: The openCypher Project - An Open Graph Query Language

Basic Query Comparison: Who do people report to?

SELECT *FROM Employee as e JOIN Employee_Report AS er ON (e.id = er.manager_id) JOIN Employee AS sub ON (er.sub_id = sub.id)

MATCH (e:Employee)-[:REPORTS_TO]->(mgr:Employee)RETURN *

Page 14: The openCypher Project - An Open Graph Query Language

Basic Query: Who do people report to?

Page 15: The openCypher Project - An Open Graph Query Language

Basic Query: Who do people report to?

Page 16: The openCypher Project - An Open Graph Query Language

Cypher SyntaxOnly Tip of the Iceberg

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking [email protected] or [email protected] or Nicole as well.
Page 17: The openCypher Project - An Open Graph Query Language

Syntax: Patterns

( )-->( )

(node:Label {key:value})

(node1)-[rel:REL_TYPE {key:value}]->(node2)

(node1)-[:REL_TYPE1]->(node2)<-[:REL_TYPE2]-(node3)

(node1)-[:REL_TYPE*m..n]->(node2)

Page 18: The openCypher Project - An Open Graph Query Language

Patterns are used in

• (OPTIONAL) MATCH

• CREATE, MERGE

• shortestPath()

• Predicates

• Expressions

• (Comprehensions)

Page 19: The openCypher Project - An Open Graph Query Language

Syntax: Structure

(OPTIONAL) MATCH <patterns>

WHERE <predicates>

RETURN <expression> AS <name>

ORDER BY <expression>

SKIP <offset> LIMIT <size>

Page 20: The openCypher Project - An Open Graph Query Language

Syntax: Automatic Aggregation

MATCH <patterns>

RETURN <expr>, collect([distinct] <expression>) AS <name>,

count(*) AS freq

ORDER BY freq DESC

Page 21: The openCypher Project - An Open Graph Query Language

DataFlow: WITH

WITH <expression> AS <name>, ....

• controls data flow between query segments• separates reads from writes• can also• aggregate• sort• paginate

• replacement for HAVING• as many WITHs as you like

Page 22: The openCypher Project - An Open Graph Query Language

Structure: Writes

CREATE <pattern>

MERGE <pattern> ON CREATE ... ON MATCH ...

(DETACH) DELETE <entity>

SET <property,label>

REMOVE <property,label>

Page 23: The openCypher Project - An Open Graph Query Language

Data Import

[USING PERODIC COMMIT <count>]

LOAD CSV [WITH HEADERS] FROM „URL“ AS row

... any Cypher clauses, mostly match + updates ...

Page 24: The openCypher Project - An Open Graph Query Language

Collections

UNWIND (range(1,10) + [11,12,13]) AS x

WITH collect(x) AS coll

WHERE any(x IN coll WHERE x % 2 = 0)

RETURN size(coll), coll[0], coll[1..-1] ,

reduce(a = 0, x IN coll | a + x),

extract(x IN coll | x*x), filter(x IN coll WHERE x > 10),

[x IN coll WHERE x > 10 | x*x ]

Page 25: The openCypher Project - An Open Graph Query Language

Maps & Entities

WITH {age:42, name: „John“, male:true} as data

WHERE exists(data.name) AND data[„age“] = 42

CREATE (n:Person) SET n += data

RETURN [k in keys(n) WHERE k CONTAINS „a“

| {key: k, value: n[k] } ]

Page 26: The openCypher Project - An Open Graph Query Language

Optional Schema

CREATE INDEX ON :Label(property)

CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE

CREATE CONSTRAINT ON (n:Label) ASSERT exists(n.property)

CREATE CONSTRAINT ON (:Label)-[r:REL]->(:Label2)

ASSERT exists(r.property)

Page 27: The openCypher Project - An Open Graph Query Language

And much more ...

neo4j.com/docs/stable/cypher-refcard

Page 28: The openCypher Project - An Open Graph Query Language

More Examples

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking [email protected] or [email protected] or Nicole as well.
Page 29: The openCypher Project - An Open Graph Query Language

MATCH (sub)-[:REPORTS_TO*0..3]->(boss), (report)-[:REPORTS_TO*1..3]->(sub)WHERE boss.firstName = 'Andrew'RETURN sub.firstName AS Subordinate,

count(report) AS Total;

Express Complex Queries Easily with Cypher

Find all direct reports and how many people they manage, each up to 3 levels down

Cypher Query

SQL Query

Page 30: The openCypher Project - An Open Graph Query Language

Who is in Robert’s (direct, upwards) reporting chain?

MATCH path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)WHERE sub.firstName = 'Robert'RETURN path;

Page 31: The openCypher Project - An Open Graph Query Language

Who is in Robert’s (direct, upwards) reporting chain?

Page 32: The openCypher Project - An Open Graph Query Language

Product Cross-SellMATCH (choc:Product {productName: 'Chocolade'}) <-[:ORDERS]-(:Order)<-[:SOLD]-(employee), (employee)-[:SOLD]->(o2)-[:ORDERS]->(other:Product)RETURN employee.firstName, other.productName, count(distinct o2) as countORDER BY count DESCLIMIT 5;

Page 33: The openCypher Project - An Open Graph Query Language

Product Cross-Sell

Page 34: The openCypher Project - An Open Graph Query Language

Neo4j‘s Cypher Implementation

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking [email protected] or [email protected] or Nicole as well.
Page 35: The openCypher Project - An Open Graph Query Language

History of Cypher

• 1.4 - Cypher initially added to Neo4j• 1.6 - Cypher becomes part of REST API• 1.7 - Collection functions, global search, pattern predicates• 1.8 - Write operations• 1.9 Type System, Traversal Matcher, Caches, String functions, more

powerful WITH, Lazyness, Profiling, Execution Plan• 2.0 Label support, label based indexes and constraints, MERGE,

transactional HTTP endpoint, literal maps, slices, new parser, OPTIONAL MATCH

• 2.1 – LOAD CSV, COST Planner, reduce eagerness, UNWIND, versioning• 2.2 – COST Planner default, EXPLAIN, PROFILE, vis. Query Plan, IDP• 2.3 -

Page 36: The openCypher Project - An Open Graph Query Language

Try it out!

Petra Selmer
[email protected] - I added a new image
Michael Hunger
cut off the browser chrome, show the graph result instead
Page 37: The openCypher Project - An Open Graph Query Language

APIs• Embedded• graphDb.execute(query, params);

• HTTP – transactional Cypher endpoint• :POST /db/data/transaction[/commit] {statements:[{statement: „query“,

parameters: params, resultDataContents:[„row“], includeStats:true},....]}

• Bolt – binary protocol• Driver driver = GraphDatabase.driver( "bolt://localhost" );

Session session = driver.session();

Result rs = session.run("CREATE (n) RETURN n");

Page 38: The openCypher Project - An Open Graph Query Language

Cypher Today - Neo4j Implementation

• Convert the input query into an abstract syntax tree (AST)• Optimise and normalise the AST (alias expansion, constant folding etc)• Create a query graph - a high-level, abstract representation of the query -

from the normalised AST• Create a logical plan, consisting of logical operators, from the query graph,

using the statistics store to calculate the cost. The cheapest logical plan is selected using IDP (iterative dynamic programming)

• Create an execution plan from the logical plan by choosing a physical implementation for logical operators

• Execute the queryhttp://neo4j.com/blog/introducing-new-cypher-query-optimizer/

Page 39: The openCypher Project - An Open Graph Query Language

Cypher Today - Neo4j Implementation

Page 40: The openCypher Project - An Open Graph Query Language

Neo4j Query Planner

Cost based Query Planner since Neo4j 2.2• Uses database stats to select best plan• Currently for Read Operations• Query Plan Visualizer, finds• Non optimal queries• Cartesian Product• Missing Indexes, Global Scans• Typos• Massive Fan-Out

Page 41: The openCypher Project - An Open Graph Query Language

openCypherAn open graph query language

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking [email protected] or [email protected] or Nicole as well.
Page 42: The openCypher Project - An Open Graph Query Language

Why ?

We love Cypher!

Our users love Cypher.

We want to make everyone happy through using it.

And have Cypher run on their data(base).

We want to collaborate with community and industry partners to

create the best graph query language possible!

Michael Hunger
what are the guiding principles around openCypher, what about governance, decision making, processes?
Petra Selmer
The final decision - regarding PRs of CIPs, feature reqs, RI artifacts etc - will be made by the CLG. I am unsure as to whether this is something that you want to have written on the slide, though. The idea is to open up the "body", but we have not yet discussed this in enough depth to give more specifics. We've also toyed with opening up the CLG, but nothing further, and certainly nothing we want to commit to at this early stage.
Petra Selmer
Re processes, this is (will be covered in the CIP slides)
Petra Selmer
I have actually put in a point about the CLG and governance after all.... Please amend the point if it sounds too... controlling! Philip has made mention of the fact that the CLG must not appear to be "gatekeepers", so we have to tread very lightly here.
Petra Selmer
I have now published some CLG minutes, but not sure whether it is useful to show this here... so up to you. If you do decide it is useful, the link is https://opencypher.github.io/meeting-minutes/
Michael Hunger
what are the goals
Petra Selmer
I've moved the "Goals" slide to just after this one.. does this address the comment?
Page 43: The openCypher Project - An Open Graph Query Language

We love the love

Michael Hunger
what are the guiding principles around openCypher, what about governance, decision making, processes?
Petra Selmer
The final decision - regarding PRs of CIPs, feature reqs, RI artifacts etc - will be made by the CLG. I am unsure as to whether this is something that you want to have written on the slide, though. The idea is to open up the "body", but we have not yet discussed this in enough depth to give more specifics. We've also toyed with opening up the CLG, but nothing further, and certainly nothing we want to commit to at this early stage.
Petra Selmer
Re processes, this is (will be covered in the CIP slides)
Petra Selmer
I have actually put in a point about the CLG and governance after all.... Please amend the point if it sounds too... controlling! Philip has made mention of the fact that the CLG must not appear to be "gatekeepers", so we have to tread very lightly here.
Petra Selmer
I have now published some CLG minutes, but not sure whether it is useful to show this here... so up to you. If you do decide it is useful, the link is https://opencypher.github.io/meeting-minutes/
Michael Hunger
what are the goals
Petra Selmer
I've moved the "Goals" slide to just after this one.. does this address the comment?
Page 44: The openCypher Project - An Open Graph Query Language

Future of (open)Cypher

• Decouple the language from Neo4j

• Open up and make the language design process transparent

• Encourage use within of databases/tools/highlighters/etc

• Delivery of language docs, tools and implementation

• Governed by the Cypher Language Group (CLG)

Michael Hunger
what are the guiding principles around openCypher, what about governance, decision making, processes?
Petra Selmer
The final decision - regarding PRs of CIPs, feature reqs, RI artifacts etc - will be made by the CLG. I am unsure as to whether this is something that you want to have written on the slide, though. The idea is to open up the "body", but we have not yet discussed this in enough depth to give more specifics. We've also toyed with opening up the CLG, but nothing further, and certainly nothing we want to commit to at this early stage.
Petra Selmer
Re processes, this is (will be covered in the CIP slides)
Petra Selmer
I have actually put in a point about the CLG and governance after all.... Please amend the point if it sounds too... controlling! Philip has made mention of the fact that the CLG must not appear to be "gatekeepers", so we have to tread very lightly here.
Petra Selmer
I have now published some CLG minutes, but not sure whether it is useful to show this here... so up to you. If you do decide it is useful, the link is https://opencypher.github.io/meeting-minutes/
Michael Hunger
what are the goals
Petra Selmer
I've moved the "Goals" slide to just after this one.. does this address the comment?
Page 45: The openCypher Project - An Open Graph Query Language

CIP (Cypher Improvement Proposal)• A CIP is a semi-formal specification

providing a rationale for new language features and constructs

• Contributions are welcome: submit either a CIP (as a pull request) or a feature request (as an issue) at the openCypher GitHub repository

• See „Ressources“ for• accepted CIPs• Contribution Process• Template

github.com/opencypher/openCypher

Page 46: The openCypher Project - An Open Graph Query Language

CIP structure• Sections include:• motivation, • background, • proposal (including the

syntax and semantics), • alternatives, • interactions with existing

features, • benefits,• drawbacks

• Example of the “STARTS WITH / ENDS WITH / CONTAINS” CIP

Page 47: The openCypher Project - An Open Graph Query Language

Deliverables

✔ Improvement Process ✔ Governing Body ✔ Language grammar (Jan-2016)

Technology certification kit (TCK) Cypher Reference Documentation Cypher language specification Reference implementation (under Apache 2.0) Cypher style guide Opening up the CLG

Page 48: The openCypher Project - An Open Graph Query Language

Cypher language specification

• EBNF Grammar

• Railroad diagrams

• Semantic specification

• Licensed under a Creative Commons license

Page 49: The openCypher Project - An Open Graph Query Language

Language Grammar (RELEASED Jan-30-2016)

…Match = ['OPTIONAL', SP], 'MATCH', SP, Pattern, {Hint}, [Where] ;

Unwind = 'UNWIND', SP, Expression, SP, 'AS', SP, Variable ;

Merge = 'MERGE', SP, PatternPart, {SP, MergeAction} ;

MergeAction = ('ON', SP, 'MATCH', SP, SetClause) | ('ON', SP, 'CREATE', SP, SetClause);...

github.com/opencypher/openCypher/blob/master/grammar.ebnf

Page 50: The openCypher Project - An Open Graph Query Language

Technology Compliance Kit (TCK)

● Validates a Cypher implementation

● Certifies that it complies with a given version of Cypher

● Based on given dataset

● Executes a set of queries and

● Verifies expected outputs

Michael Hunger
how would that work? checking the outputs? on a textual basis? i.e. completely independent of implementation stack/language?
Petra Selmer
I think at this stage any more detail may back us into a corner... if this was further long, we could say more about it. However, we're just at the beginning..
Page 51: The openCypher Project - An Open Graph Query Language

Cypher Reference Documentation

• Style Guide

• User documentation describing the use of Cypher

• Example datasets with queries

• Tutorials

• GraphGists

Page 52: The openCypher Project - An Open Graph Query Language

Style Guide

• Label are CamelCase

• Properties and functions are lowerCamelCase

• Keywords and Relationship-Types are ALL_CAPS

• Patterns should be complete and left to right

• Put anchored nodes first

• .... to be released ...

Page 53: The openCypher Project - An Open Graph Query Language

Reference implementation (ASL 2.0)

• A fully functional implementation of key parts of the stack needed to support Cypher inside a platform or tool

• First deliverable: parser taking a Cypher statement and parsing it into an AST (abstract syntax tree)

• Future deliverables:• Rule-based query planner• Query runtime

• Distributed under the Apache 2.0 license• Can be used as example or as a implementation foundation

Page 54: The openCypher Project - An Open Graph Query Language

The Cypher Language Group (CLG)

• The steering committee for language evolution

• Reviews feature requests and proposals (CIP)

• Caretakers of the language

• Focus on guiding principles

• Long term focus, no quick fixes & hacks

• Currently group of Cypher authors, developers and users

• Publish Meeting Minutes -> opencypher.github.io/meeting-minutes/

Page 55: The openCypher Project - An Open Graph Query Language

“Graph processing is becoming an indispensable part of the modern big data stack. Neo4j’s Cypher query language has greatly accelerated graph database adoption.

We are looking forward to bringing Cypher’s graph pattern matching capabilities into the Spark stack, making it easier for masses to access query graph processing.”

- Ion Stoica, CEO & Founder Databricks

“Lots of software systems could be improved by using a graph datastore. One thing holding back the category has been the lack of a widely supported, standard graph query language. We see the appearance of openCypher as an important step towards the broader use of graphs across the industry.”

- Rebecca Parsons, ThoughtWorks, CTO

Some people like it

Page 56: The openCypher Project - An Open Graph Query Language

And support openCypher

Page 57: The openCypher Project - An Open Graph Query Language

Ressources

• http://www.opencypher.org/

• https://github.com/opencypher/openCypher• https://github.com/opencypher/openCypher/blob/master/

CONTRIBUTING.adoc

• https://github.com/opencypher/openCypher/tree/master/cip

• https://github.com/opencypher/openCypher/pulls

• http://groups.google.com/group/openCypher

• @openCypher

Page 58: The openCypher Project - An Open Graph Query Language

Please contributeFeedback, Ideas, ProposalsImplementations

Thank You !Questions ?