Download - The Perfect Fit: Scalable Graph for Big Data
Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise software, good and bad
Provide a forum for detailed analysis of today’s innovative technologies
Give vendors a chance to explain their product to savvy analysts
Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
June: INNOVATORS
July: SQL INNOVATION
August: REAL-TIME DATA
Twitter Tag: #briefr The Briefing Room
When You’re Hot…
Ø Biggest Web engines use graph
Ø Very powerful for finding relationships
Ø More versatile than other DB formats
Ø Great for unwinding complex scenarios
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
[email protected] @robinbloor
Twitter Tag: #briefr The Briefing Room
SYSTAP
SYSTAP builds highly-scalable open source solutions for big graphs
Its flagship product is Blazegraph, a platform that supports semantic web and graph database APIs. It features fault tolerant storage & query capabilities and online backup & failover.
Blazegraph achieves its scale and high throughput by leveraging GPU acceleration via its Mapgraph technology
Twitter Tag: #briefr The Briefing Room
Guest: Brad Bebee
Brad Bebee is the CEO and Managing Partner at SYSTAP, LLC. Brad leads the efforts to use SYSTAP technologies for high performance graph databases and analytics to delivery solutions for multiple business and mission areas. Over the course of his career, he has served as a CTO, CFO, managed operating divisions, and performed advanced technology development for commercial and government customers. He is an active contributor to SYSTAP’s open source software projects. His technology experience ranges from early work in modeling methodologies and knowledge representation dating back to precursors of DARPA’s DAML program to more recent work with large scale data analytics using the Hadoop ecosystem, Accumulo, and related technologies. He has extensive experience in architecture and software modeling methodologies, where he has lead and collaborated upon multiple publications receiving recognition for his research.
http://blazegraph.com/
The Perfect Fit: Scalable Graph for Big Data
June 30, 2015 Bloor Group Briefing Room
http://blazegraph.com/ 11
Big Data Startup Award Winner: 2015 Big Data InnovaBons Summit
Helping customers achieve their business objecBves with graph data is our vision, mission, and the essence of our soJware
soluBons.
Today, we serve Fortune 500 companies, startups, governments, and research
organizaBons with technology to power their graphs.
http://blazegraph.com/
Graph Databases Grew at Over 500% in the Last Two Years
Popularity changes per category – March 2015
Popu
larit
y C
hang
es
Graph Databases
12
http://blazegraph.com/
The Amount of Graph Data is Exploding
Billion+ Edges
13SYSTAP™, LLC© 2006-2015 All Rights Reserved
http://blazegraph.com/ SYSTAP™, LLC
© 2006-2015 All Rights Reserved 14
Graph Applications are Everywhere
• Community Detection / Clustering
• Recommendation Systems
• Fault Prediction in Industrial and Internet of Things (IoT)
• Drug Discovery / Repurposing
• Precision Medicine / Genomics
• Fraud Detection • Time Series,
Compliance
• Cyber • Defense / Security
http://blazegraph.com/
Graphs are different. You need the right paradigm and hardware to scale
https://datatake.files.wordpress.com/2015/09/latency.png
Graph Cache Thrash The CPU just waits for graph data from main memory...
Type
of C
ache
or M
emor
y
Access Latency Per Clock Cycle
SYSTAP™, LLC© 2006-2015 All Rights Reserved
15
http://blazegraph.com/
Solutions to the Graph Scaling Problem Using Graph Databases and GPUs
● Embedded● High Availability● Scale-out
● GPU Acceleration● 100s of Times Faster
than CPU main memory-based systems
● Up to 40X Cheaper● 10,000X Faster than
disk-based technologies
http://blazegraph.com/
Uncovering influence links in molecular knowledge networks to streamline personalized medicine | Shin, Dmitriy et al.Journal of Biomedical Informatics , Volume 52 , 394 - 405
Finding the Next Cure for Cancer is a Billion+ Edge Graph Challenge
17
http://blazegraph.com/
Graphs Enable Enterprises to Manage Metadata
• Data outlives specific system implementaBons. • Data outlives applicaBons. • Achieve Metadata independence using declaraBve standards
to manage metadata and express transformaBons.
Data SourcesData Providers
Knowledge Graph: Instance Data + Ontology (RDF + OWL)
ACLsQuery Catalog
Constraints Rules Events Mappings Widgets Views
20
http://blazegraph.com/
Knowledge Base of Biology (KaBOB)
Open Biomedical Ontologies
biomedical data &
informaBon
applicaBon data
biomedical knowledge
Entrez Gene
17 databases
DIP
UniProt
GOA
GAD
HGNC
InterPro
Gene Ontology
Sequence Ontology
Cell Type Ontology ChEBI NCBI
Taxonomy Protein Ontology
12 ontologies
… …
21
http://blazegraph.com/
Powering Their Graphs with Blazegraph™
SYSTAP™, LLC© 2006-2015 All Rights Reserved
Information Management / Retrieval
Genomics / Precision Medicine
Defense, Intel, Cyber
22
http://blazegraph.com/
The right scaling approach depends on the business need
SYSTAP™, LLC© 2006-2015 All Rights Reserved
Single GPU (500+M)
MulB-‐GPU Clusters (100+B)
23
Fast Fastest Speed
Data Scale (E
dges) Scale Out
(1T+)
High Availability
(50B)
JVM
Journal
Embedded Single Server
(50B) Millions
Billions
Trillions
http://blazegraph.com/
Blazegraph™ stands out!
• Wikimedia EvaluaBon: hfps://docs.google.com/a/systap.com/spreadsheets/d/1MXikljoSUVP77w7JKf9EXN40OB-‐ZkMqT8Y5b2NYVKbU/edit#gid=0
SYSTAP™, LLC© 2006-2015 All Rights Reserved 24
http://blazegraph.com/
Blazegraph™: Embedded and Single Server • High performance, Scalable
– 50B edges/node – RDF/SPARQL level query language – Efficient Graph Traversal – High 9s soluBon
• Property graphs – Blueprints, gremlin, rextser
• REST API (NSS) • Extension points
– Stored queries for custom applicaBon logic on the server.
– Custom services & indices – Custom funcBons – Vertex-‐centric programs
• Embedded Server
• Standalone Server
JVM
Journal
WAR
Journal
25
http://blazegraph.com/
Blazegraph™: High Availability • Shared nothing architecture
– Same data on each node – Coordinate only at commit – Transparent load balancing
• Scaling – 50 billion triples or quads – Query throughput scales linearly
• Self healing – AutomaBc failover – AutomaBc resync aJer disconnect – Online single node disaster recovery
• Online Backup – Online snapshots (full backups) – HA Logs (incremental backups)
• Point in Bme recovery (offline)
HAService
Quorum k=3
size=3
follower
leader
HAService
HAService
26
http://blazegraph.com/
Blazegraph™: Scale-‐out
• Shard-‐based horizontal scale-‐out to support 1 Trillion+ Edge Graphs
• Fast parallel load • Efficient Query Through
CoordinaBon Between Data Services
• Coming soon! Support for HDFS for failover.
27
http://blazegraph.com/
How do I use GPUs to scale graphs?
● Parallel Processing on GPU Clusters for Trillion+ Edge Graphs
● High-Level API
● Partitioning and Overlapping Communications
● HPC and DARPA Pedigree
28
http://blazegraph.com/
Blazegraph GPU: Ridiculously Fast for Graphs
Blazegraph™ plug-in for GPU Acceleration with familiar graph APIs
Graph DB
29
http://blazegraph.com/
Mapgraph HPC with NVIDIA GPUs$16K / GTEP (K40 - Today)$4K / GTEP (Pascal 2016)
Blazegraph MulB-‐GPU: Extreme Scale, 40X more Affordable!
Cray XMT-2$~180K / GTEP
Large Hadoop Cluster $~18M / GTEP
Future Blazegraph SaaS On-demand
1 GTEP = 1 Billion Traversed Edges Per
Second
40X!10X!
30
Johnny-Come-Lately
Aside from the three letter agencies, until recently, nobody cared much
about graphs…
WHY?
Reasons for Graph Apathy…
1 Unfamiliarity (it’s obscure because it’s obscure)
2 RDBMS do not store graphs well and SQL is inadequate for querying graphs
3 No common BI applications, it’s mainly analytics
4 Semantic technology has taken a lifetime to evolve
Reasons to Care
u Graphs express very different (and important) data relationships
u Graphs are largely unexplored
u Graphs are ideal for MDM
u Graphs express semantic relationships
Semantics: The Type 0 Language
Colorless green ideas sleep furiously
Colorless green
sleep
furiously
ideas
The Net Net
The ultimate goal is INFERENCING:
Knowledge discovery (rather than pattern discovery)
through graph processing
u What are the “low hanging fruit” graphical applications – in your company’s experience?
u Does your company find itself competing with Hadoop Giraph? What are the compelling differences?
u Is Blazegraph a triple-store at the physical level (i.e., a pure RDF implementation) or does it implement a variety of physical structures?
u At what level of data volume/workload is hardware acceleration a necessity?
u What is the largest amount of data currently under management with any of your customers?
u Which companies/technologies do you compete with directly?
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
June: INNOVATORS
July: SQL INNOVATION
August: REAL-TIME DATA