OK, graph databases
• Instead of tables and SQL
• Nodes and relationships
• Specialized queries
• Not everything is a graph (and this is not sponsored)
Install / Update Neo4j
• Neo4j
• http://localhost:7474Community Edition 3.0.3
• Python, PIP, and Py2Neo
• py2neo.__version__ = ‘3b1’
Step 0 - installing• Install Neo4j - neo4j.com/install
• brew on Mac
• DigitalOcean has Linux instructions
• change default password
• Trouble installing locally?
• heroku addons:add graphene
Who uses graphs?
• Panama Papers
• IMDB / Six Degrees of Kevin Bacon
• Especially:
• social networks, research data, maps
• anywhere number of joins is large, indefinite, or unlimited
The trouble with tables
• Many joins to get people, titles, photos, additional relationship info
• Speed of query
• Difficult to write new queries
Art Graph DB• did Picasso collaborate with other artists
in his lifetime?
• are any artists credited as painter, director, sculptor, etc?(maybe an art EGOT)
Let’s build that graph
• Artists and artworks
• Basic bio data, MoMA ID -> Artist node
• Future DB: all people connected
• Title, date, MoMA ID -> Artwork node
• ARTIST_OF relationship (include order)
Let’s build that graph
• git clonehttps://github.com/mapmeld/graph
!
• Building a scraper for MoMA
If you’re interested
• Google: MapZen Extracts
• download a city
• for this script, download the OSM XML file
• if you like PostGIS, there is a download (no import script)
Benefits of OSM
• Open to use / full data
• Open to edit / choose tags
• HOT community
• Civil e-mail lists (Crimea)
Google on OSM
• "Our maps representwhat you or I need to do on a day-to-day basisin the developed part of the world”
• — Google Maps Geospatial Technologist (quoted in FastCompany)
XML data• Nodes, ways, and relations
• Ways made up of multiple nodes
• Relations contain nodes and ways
• Practically:
• Multiple ways connect / combine
• Tags are a community construct
Smart Renderer
• When is a <way> a line (cul-de-sac) or a polygon (river, lake, parking lot)?
• Has to support world’s fonts
• Tag for real life, not for the renderer
Building graph data
• Script adds all roads to Neo4j
• Includes an array of node ids (can mix content types, similar to a document database)
• If two ways share a node with the same ID, link them both ways <—>
Google Prediction API
• Prediction based on a CSV
• Categorization or numerical
• Google generates a model and estimates accuracy
• Not allowed in Myanmar
Predicting Houses• Format 60,000+ rows of database export
• Choose categories to predict 2-3 years
• Competing models determine how important each column is
• Can it parse dates? Find patterns
• Edging up to ~74 percent accuracy
Network effect
• Adding network of streets
• Now tokens include not just my street and neighbors, but shared streets
Network effect
• Google Prediction API reported 81% accuracy
• But is it good?
• Early optimization studies moved fire stations and left neighborhoods vulnerable
• City can’t maintain it… hasn’t continued to open their data
Looking forward
• Ideas for graph databases?Ways to release large graph data - as an API? As JSON files? As Neo4j dump?
• Ideas for statisticians / future research?