word puzzles with neo4j and py2neo
TRANSCRIPT
Overview
● Brief look at graph databases & Neo4j
● Introduction to word transformation game
● Getting suitable words
● Adding words and relationships into Neo4j
● Querying graph data to generate puzzles
NoSQL – when is it a good fit?
● SQL has its origins in the 1970s
and may not be fresh and shiny
any more but ...
● … we shouldn't choose NoSQL
for reasons of fashion.
● Venerable SQL often a better
choice for standard hierarchies
e.g. countries that have cities
that have suburbs etc
Graph Databases● Graph databases much, much better for related data with:
– lots of different links between same nodes
– different numbers of links between nodes
e.g. 3 hops to one peer and 7 hops to another
– lots of peer-to-peer links
Substantial Benefits
● Massive performance benefits (going exponential as number
of links grows)
● Structural harmony
– between structure of data and structure of data storage
(what you draw on the whiteboard might look very similar
to how you data is actually structured)
– between questions of data and query language used to
answer them
Word transformations
● Start with one word and get to
the other by single-letter
tranformations word-by-word
● E.g. starting with “stores” get to
“slaked”
– BTW there are 96 alternative
ways 5 moves or less
stores
stored
stared
staked
slaked
Puzzle taster
Get from 'sloven' to 'closed' in
no more than 5 steps
(there are 10 unique solutions)
sloven
?closed
Getting a simple word list
● How hard could it be?
● Lesson #1 – scrabble lists and similar are useless – only want lists
with standard words otherwise puzzles too hard
● Lesson #2 – have to decide about taboo/profane words
● Lesson #3 – the number of words affects the number of
ONE_LETTER_DIFF relationships a lot
● Lesson #4 – clever optimisation not needed if restricting self to
ordinary words
SCOWL (Spell Checker Oriented Word Lists) http://wordlist.aspell.net/
Filtering words
● Needed to turn é to e
● Needed to eliminate possessives e.g. cat's (as used in the phrase “the
cat's whiskers”)
● Needed to leave out capitalised words
For each word, identifying words different by one letter onlyDisclaimer: the code worked but probably some super-smart optimisations
would be possible involving n-dimensional space or something
Adding data to Neo4j
● Create nodes and relationships
● Lots of room for optimisations
● Only need to build database once so 15 minutes is not worth
reducing
● My Neo4j and Py2neo is beginner level but I was able to solve my
problem
Resources
● Neo4j
– http://neo4j.com/books/graph-databases/
– http://neo4j.com/graphacademy/
– http://graphgist.neo4j.com/#!/gists
– https://www.youtube.com/channel/UCvze3hU6OZBkB1vkhH2lH9Q
● Py2neo
– http://py2neo.org/2.0/
● SCOWL
– http://wordlist.aspell.net/