la bi, l'informatique décisionnelle et les graphes
TRANSCRIPT
Neo Technology, Inc Confidential
La BI, l'informatique décisionnelle
et les graphesPhilip Rathle, Sr Dir Products
[email protected]://twitter.com/prathle
Neo Technology, Inc Confidential
The BrainStructure: Neurons Connected by Synapses.
Processing: Signals Relayed Between Neurons through Synapses
Neo Technology, Inc Confidential
Human ThinkingStructured Unstructured (Creative)
Both forms involve processing connections
Neo Technology, Inc Confidential
Evolution of Web SearchSurvival of the Fittest
Pre-1999WWW Indexing
Discrete Data
1999 - 2012Google Invents
PageRank
Connected Data(Simple)
2012-?Google Knowledge Graph, Facebook Graph Search
Connected Data(Rich)
Neo Technology, Inc Confidential
Evolution of Online Recruiting
2010-11Resume Searching &
Scoring
Aggregated Data
Survival of the Fittest
2011-12Social Job Search
Connected Data
Neo Technology, Inc Confidential
Consumer Web Giants Depends on Five GraphsGartner’s “5 Graphs”
Social Graph
Ref: http://www.gartner.com/id=2081316
Interest Graph
Payment Graph
Intent Graph
Mobile Graph
Neo Technology, Inc Confidential
Core Industries & Use Cases: Web / ISV Finance &
InsuranceCommuni-
cationsLogistics Life
SciencesMedia &
Publishing
Education, Not-for-
Profit
Government, Aerospace,
Gaming, Other
Network Management
MDM
Social
Geo
Authorization & Access Control
Content Management
Recommend-ations
Fraud Detection,
Other
Accenture
Select Commercial Customers (Community Users Not Included)
Neo4j Adoption Snapshot*
Neo Technology, Inc Confidential
Data Storage & Processing
• Graph Databases
• Graph Compute Engines
Programming:
• Graph-Centric APIs & Languages
• Graph Algorithms
Tools:
• Visualization Tools & Libraries
• Other
Key Graph Analytic Technologies
Neo Technology, Inc Confidential
Typical Graph BI Environment
ApplicationOther
Databases
ETL
Neo4jCluster
Data Storage &Business Rules Execution
Reporting
Graph-Dashboards&Ad-hocAnalysis
GraphVisualization
End User Ad-hoc visual navigation & discovery
Bulk Analytic Infrastructure
(e.g. Graph Compute Engine)
ETL
Graph Mining & Aggregation
Data Scientist
Ad-HocAnalysis
What is aGraph Database
A graph database is an online (“real-time”) database management system with CRUD methods that expose a graph data model
• Two important properties:
• Native graph processing, includingindex-free adjacency1 to facilitate traversals
• Native graph storage engine, i.e. written from the ground up to be optimized for managing graph data
1] See Rodriguez, M.A., Neubauer, P., , “The Graph Traversal Pattern,” 2010 (http://arxiv.org/abs/1004.1001)
Overview of PopularGraph Data Models
• Property Graph
• Description: A “directed, labeled, attributed, multi-graph”1 which exposes three building blocks: nodes, typed relationships and key-value properties on both nodes and relationships
• Vendors: Neo4j, OrientDB, InfiniteGraph, Dex
• RDF Triples
• Description: URI-centered subject-predicate-object triples as pioneered by the semantic web movement2
• Vendors: AllegroGraph, Sesame
• HyperGraph
• Description: A generalized graph where a relationship can connect an arbitrary amount of nodes (compared to the more common binary graph models)3
• Vendors: HyperGraphDB, TrinityDB1] Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” 2010, http://arxiv.org/abs/1006.23612] W3C, “The Resource Description Framework (RDF),” 2004, http://www.w3.org/RDF/3] Wikipedia, http://en.wikipedia.org/wiki/Hypergraph
Graph Compute EngineProcessing platforms that enable graph global computational algorithms to be run against large data sets
Graph Mining Engine
(Working Storage)
In-Memory ProcessingSystem(s)of Record
Graph Compute Engine
Data extraction,transformation,
and load
Neo Technology, Inc Confidential
Graph Global QueriesWhat is the max/min/avg. number of connections per node?
(aka “Degree Distribution”)
Neo Technology, Inc Confidential
For the Facebook Graph Question:
What sushi restaurants in NYC do my friends like?
Neo Technology, Inc Confidential
What the Graph Looks Like:What sushi restaurants in NYC do my friends like?
Neo Technology, Inc Confidential
What the Cypher Query Looks Like:What sushi restaurants in NYC do my friends like?
START me=node:person(name = 'Philip'), location=node:location(location='New York'), cuisine=node:cuisine(cuisine='Sushi')
MATCH (me)-[:IS_FRIEND_OF]->(friend)-[:LIKES]->(restaurant) -[:LOCATED_IN]->(location),(restaurant)-[:SERVES]->(cuisine)
RETURN restaurant
Neo Technology, Inc Confidential
What the Search Looks Like:What sushi restaurants in NYC do my friends like?
Neo Technology, Inc Confidential
What Other Graph Searches Look LikeWhat drugs will bind to protein X and not interact with drug Y?
Neo Technology, Inc Confidential
#1: The Network GraphGraphs in Communications
Cell Signal Analysis
Router
Service
DEPEN
DS_ON
Switch Switch
Router
Fiber LinkFiber Link
Fiber Link
Oceanfloor Cable
DEP
END
S_O
N
DEPEN
DS_O
N
DEPENDS_ON
DEPEN
DS_O
NDEPENDS_ON
DEPENDS_ON
DEPENDS_ON
DEPENDS_ON
DEP
END
S_O
N
LINKED
LINKED
LINKED
DEPENDS_ON
“What if” Downtime Analysis(Service-to-Infrastructure Mapping)
Network Inventory & Cost Accounting
Neo Technology, Inc Confidential
#2: The Social GraphGraphs in Communications
Mobile apps, Collaboration,
Social Recommendations, and more...
Neo Technology, Inc Confidential
#3: The Call Graph Graphs in Communications
Plan & Feature Recommendations, Assess Churn Risk
Neo Technology, Inc Confidential
#4: Master Data GraphGraphs in Communications
Organizational HierarchyManagement Resource Authorization
Ref: http://www.slideshare.net/verheughe/how-nosql-paid-off-for-telenor
Neo Technology, Inc Confidential
#5: The Help Desk GraphGraphs in Communications
Online Recommendationsfor Case Avoidance
Neo Technology, Inc Confidential
Entitlements & Identity Management
Network Asset Management
Network Cell Analysis
Geo Routing(Public Transport)
BioInformatics
Emergent Graph in Other Industries(Actual Neo4j Graphs)
Insurance Risk Analysis
Neo Technology, Inc Confidential
Web Browsing Portfolio Analytics
Mobile Social ApplicationGene Sequencing
Emergent Graph in Other Industries(Actual Neo4j Graphs)
Neo Technology, Inc Confidential
Background• World’s largest provider of IT infrastructure, software
& services
• HP’s Unified Correlation Analyzer (UCA) application is a key application inside HP’s OSS Assurance portfolio
• Carrier-class resource & service management, problem determination, root cause & service impact analysis
• Helps communications operators manage large, complex and fast changing networks
Business problem• Use network topology information to identify root
problems causes on the network
• Simplify alarm handling by human operators
• Automate handling of certain types of alarms Help operators respond rapidly to network issues
• Filter/group/eliminate redundant Network Management System alarms by event correlation
Solution & Benefits• Accelerated product development time
• Extremely fast querying of network topology
• Graph representation a perfect domain fit
• 24x7 carrier-grade reliability with Neo4j HA clustering
• Met objective in under 6 months
Industry: Web/ISV, CommunicationsUse case: Network ManagementGlobal (U.S., France)
Neo Technology, Inc Confidential
Background
• One of the world’s largest logistics carriers
• Projected to outgrow capacity of old system
• New parcel routing system• Single source of truth for entire network
• B2C & B2B parcel tracking
•Real-time routing: up to 5M parcels per day
Business problem• 24x7 availability, year round• Peak loads of 2500+ parcels per second
• Complex and diverse software stack• Need predictable performance & linear
scalability
• Daily changes to logistics network: route from any point, to any point
Solution & Benefits• Neo4j provides the ideal domain fit:
• a logistics network is a graph
• Extreme availability & performance with Neo4j clustering
• Hugely simplified queries, vs. relational for complex routing
• Flexible data model can reflect real-world data variance much better than relational
• “Whiteboard friendly” model easy to understand
Industry: LogisticsUse case: Parcel Routing
Neo Technology, Inc Confidential
Industry: Online Job SearchUse case: Social / Recommendations
• Online jobs and career community, providing anonymized inside information to job seekers
Business problem• Wanted to leverage known fact that most jobs are
found through personal & professional connections
• Needed to rely on an existing source of social network data. Facebook was the ideal choice.
• End users needed to get instant gratification
• Aiming to have the best job search service, in a very competitive market
Solution & Benefits• First-to-market with a product that let users find jobs
through their network of Facebook friends
• Job recommendations served real-time from Neo4j
• Individual Facebook graphs imported real-time into Neo4j
• Glassdoor now stores > 50% of the entire Facebook social graph
• Neo4j cluster has grown seamlessly, with new instances being brought online as graph size and load have increased
Person
Company
KNO
WS
Person
Person
KNOWS
Company
KN
OW
S
WORKS_AT
WORKS_AT
Neo Technology Confidential
Background
Sausalito, CA
Neo Technology, Inc Confidential
Industry: CommunicationsUse case: Recommendations
• Cisco.com serves customer and business customers with Support Services
• Needed real-time recommendations, to encourage use of online knowledge base
• Cisco had been successfully using Neo4j for its internal master data management solution.
• Identified a strong fit for online recommendations
Solution & Benefits• Cases, solutions, articles, etc. continuously scraped
for cross-reference links, and represented in Neo4j
• Real-time reading recommendations via Neo4j• Neo4j Enterprise with HA cluster
• The result: customers obtain help faster, with decreased reliance on customer support
Neo Technology Confidential
Background
Business problem• Call center volumes needed to be lowered by
improving the efficacy of online self service
• Leverage large amounts of knowledge stored in service cases, solutions, articles, forums, etc.
• Problem resolution times, as well as support costs, needed to be lowered
Support Case
Support Case
KnowledgeBase
Article
Solution
KnowledgeBase
Article
KnowledgeBase
Article
Message
San Jose, CA
Cisco.com
Neo Technology, Inc Confidential
Interactive Television Programming
Industry: CommunicationsUse case: Social gaming
Background• Europe’s largest communications company
• Provider of mobile & land telephone lines to consumers and businesses, as well as internet services, television, and other services
Solution & Benefits• Interactive, social offering gives fans a way to
experience the game more closely
• Increased customer stickiness for Deutsche Telekom
• A completely new channel for reaching customers with information, promotions, and ads
• Clear competitive advantage
Frankfurt, Germany
Business problem• The Fanorakel application allows fans to have an
interactive experience while watching sports
• Fans can vote for referee decisions and interact with other fans watching the game
• Highly connected dataset with real-time updates
• Queries need to be served real-time on rapidly changing data
• One technical challenge is to handle the very high spikes of activity during popular games
Neo Technology, Inc Confidential
Reasons for Choosing a Graph Database
1. Order-of-magnitude improvements in query performance for complex, connected data
2. Drastically accelerated application development cycles
3. Maintainability and extensibility of the data model
4. Maturity and reliability of the product
Neo Technology, Inc Confidential
Merci !
Pour aller plus loin :
Cédric Fauvet – Votre contact en France
E-‐mail : [email protected]+er : @Neo4jFrCommunauté Francophone : meetup.com/graphdb-‐france