word puzzles with neo4j and py2neo

22
Presented by Grant Paton-Simpson Word Puzzles with Neo4j and Py2neo

Upload: grant-paton-simpson

Post on 16-Apr-2017

374 views

Category:

Software


1 download

TRANSCRIPT

Presented by Grant Paton-Simpson

Word Puzzles with Neo4j and Py2neo

Overview

● Brief look at graph databases & Neo4j

● Introduction to word transformation game

● Getting suitable words

● Adding words and relationships into Neo4j

● Querying graph data to generate puzzles

Graph Databases – a NoSQL option

http://neo4j.com/books/graph-databases/

NoSQL – when is it a good fit?

● SQL has its origins in the 1970s

and may not be fresh and shiny

any more but ...

● … we shouldn't choose NoSQL

for reasons of fashion.

● Venerable SQL often a better

choice for standard hierarchies

e.g. countries that have cities

that have suburbs etc

https://twitter.com/edd/status/400190499585544192

Graph Databases● Graph databases much, much better for related data with:

– lots of different links between same nodes

– different numbers of links between nodes

e.g. 3 hops to one peer and 7 hops to another

– lots of peer-to-peer links

Substantial Benefits

● Massive performance benefits (going exponential as number

of links grows)

● Structural harmony

– between structure of data and structure of data storage

(what you draw on the whiteboard might look very similar

to how you data is actually structured)

– between questions of data and query language used to

answer them

Word transformations

● Start with one word and get to

the other by single-letter

tranformations word-by-word

● E.g. starting with “stores” get to

“slaked”

– BTW there are 96 alternative

ways 5 moves or less

stores

stored

stared

staked

slaked

Puzzle taster

Get from 'sloven' to 'closed' in

no more than 5 steps

(there are 10 unique solutions)

sloven

?closed

Getting a simple word list

● How hard could it be?

● Lesson #1 – scrabble lists and similar are useless – only want lists

with standard words otherwise puzzles too hard

● Lesson #2 – have to decide about taboo/profane words

● Lesson #3 – the number of words affects the number of

ONE_LETTER_DIFF relationships a lot

● Lesson #4 – clever optimisation not needed if restricting self to

ordinary words

SCOWL (Spell Checker Oriented Word Lists) http://wordlist.aspell.net/

Filtering words

● Needed to turn é to e

● Needed to eliminate possessives e.g. cat's (as used in the phrase “the

cat's whiskers”)

● Needed to leave out capitalised words

For each word, identifying words different by one letter onlyDisclaimer: the code worked but probably some super-smart optimisations

would be possible involving n-dimensional space or something

Adding data to Neo4j

● Create nodes and relationships

● Lots of room for optimisations

● Only need to build database once so 15 minutes is not worth

reducing

● My Neo4j and Py2neo is beginner level but I was able to solve my

problem

Py2neo and Cypher

Cypher Syntax as ASCII Art (Really!)

Word WordONE_OFF

(Word) -[ONE_OFF]->(Word)

Cypher Syntax as ASCII Art (Really!)

Word WordONE_OFF

(Word) -[ONE_OFF]->(Word)

How cool is this?

Example Output

Matching chart

Live Demo – Suggestions for Start Word

“sloven” to “closed” solution(s)

Resources

● Neo4j

– http://neo4j.com/books/graph-databases/

– http://neo4j.com/graphacademy/

– http://graphgist.neo4j.com/#!/gists

– https://www.youtube.com/channel/UCvze3hU6OZBkB1vkhH2lH9Q

● Py2neo

– http://py2neo.org/2.0/

● SCOWL

– http://wordlist.aspell.net/

About Catalyst