infinitegraph presentation from oct 21, 2010 dbta webcast
DESCRIPTION
Here is the presentation from Warren Davidson, Director of Business Development, and Darren Wood, InfiniteGraph chief architect. The October 21, 2010 webinar hosted by DBTA, with InfiniteGraph and Riptano, covered new data technologies and how the NOSQL ("Not Only SQL") approach is beneficial in addressing some of the more complex application, scalability and performance requirements in handling vast amounts of data, and in performing advanced analytics on those data volumes with greater ease and speed.TRANSCRIPT
October 21, 2010
Warren Davidson [email protected] Wood [email protected] www.infinitegraph.com
Agenda
• The NoSQL Landscape• InfiniteGraph• Solving what problems and how?
Copyright © InfiniteGraph
Some NoSQL Notes
Copyright © InfiniteGraph
• NoSQL = Not Only SQL
• NoSQL is requirements driven
• NoSQL = open source?
• NoSQL = cloud computing?
Company Confidential
The NoSQL Landscape
Cassandra
InfiniteGraph
NoSQL Landscape
Key Value Stores
Key Value Stores
BigTable Clones
BigTable Clones
Document databasesDocument databases
Complexity
Voldemort – LinkedInDynamo - Amazon
Cassandra – FacebookHBase – Apache/HadoopHypertable
CouchDB – ApacheMongoDB
Neo4jHypergraphDBAllegroGraphSones
Performance
Graph Databases
Social Network AnalysisIntelligence Community
Graph Databases
Graph Databases• A graph database is used to trace relationships among entities, most
commonly people, to any depth. Its characteristics are:– Very simple, fixed schema– Very complex data relationships– Used to support complex associations among like entities.
6
Node
Edge
John Jones
Jane Jones-Smith
Nancy Jones Paul
Jones
Doris Smith
Jim Smith
Jeff Smith
Meta-Model Instance Example (simplified)
Attribute(s)
Jeff Smith
InfiniteGraphA business unit of Objectivity
• In the business of distributed data management for over 10 years
• Solving graph data problems for over 8 years
• Focusing on the emerging requirements of graph data for cloud and on-premise distributed systems
Copyright © InfiniteGraph
Graphs are everywhere
Enterprise and government 2.0, bio-engineering, gene sequencing, drug development…..
LinkedIn, Facebook….Social network analytics, social CRM….
Network analysis, complex BoM, predictive and real-time ISR, fraud detection and response….
Graph Databases – What’s so Different ?
Darren WoodChief Architect, InfiniteGraph
Graph Databases
• Key technical attributes• How Infinite Graph addresses these• Query and navigation• Challenges/Requirements of Distibution• Practical applications
Copyright © InfiniteGraph
Graph Databases
• Optimized around data relationships– Relationships as first class citizens– Super fast navigation between entities– Rich/flexible annotation of connections
• Small focused API (typically not SQL)– Natively work with concepts of Vertex/Edge– SQL has no concept of “navigation”– Most attempts based in SQL are convoluted
Copyright © InfiniteGraph
Physical Storage Comparison
Copyright © InfiniteGraph
Meetings
P1 Place TimeP2Alice Denver 5-27-10Bob
Calls
From Time DurationToBob 13:20 25CarlosBob 17:10 15Charlie
Payments
From Date AmountToCarlos 5-12-10 100000Charlie
Met5-27-10Alice
Called13:20Bob
Payed100000Carlos
Charlie
Called17:10
Rows/Columns/Tables Relationship/Graph Optimized
Query and Navigation• Queries – but not as you know them• More like a rules based search and discovery• Asynchronous Results
Copyright © InfiniteGraph
Alice Carlos CharlieBobMeets Calls Pays
Calls
“Find all paths between Alice and Charlie”
“Find all paths between Alice and Charlie – within 2 degrees”
“Find all paths between Alice and Charlie – events in May 2010”
Management of Large Data Graphs
• Graphs grow quickly– Billions of phone calls / day in US– Emails, social media events, IP Traffic– Financial transactions
• Some analytics require navigation of large sections of the graph
• Each step (often) depends on the last• Must distribute data and go parallel
Copyright © InfiniteGraph
Graph Partitioning
• Graph partitioning is not as simple• Graph operations are rarely partition bound• Graphs are ‘alive’• Repartitioning is expensive• Partitions must co-operate
Copyright © InfiniteGraph
Distributed API
Application(s)
Partition 1 Partition 3Partition 2 Partition ...n
Processor Processor Processor Processor
Graph Partitioning – Reality !
Copyright © InfiniteGraph
Distributed Graph Must Haves
• High performance distributed persistence• Ability to deal with remote data reads (fast)• Intelligent local cache of subgraphs• Distributed navigation processing• Distributed, multi-source concurrent ingest• Write modes supporting both strict and
eventual consistency
Copyright © InfiniteGraph
Practical Applications
Copyright © InfiniteGraph
Graph Analysis (Algorithms)
• Social Networks– Most connected participants– Influencers– Important Syndicates or Sub-networks
• Central figures in crime organisations• Business Intelligence
– Discovering Knowledge Assets– Complex analytics
Copyright © InfiniteGraph
Graph Analysis (Patterns)
• Crime (again)– Recognize common patterns of activity– Complex chains of interaction
• Security– Recognize attack/threat patterns– Auditing / log analytics
• Targeting Advertising– To specific browsing patterns
Copyright © InfiniteGraph
Many Many More !
• Spatial data• Defence / Situational Awareness• Sciences• Health Care• Genealogy• Logistics• Tracking
Copyright © InfiniteGraph