systap,llc’ -...

93
bigdata® Presented at Xcelerate 3/18/2014 SYSTAP, LLC Graphs Graph Databases Graph Analy7cs on GPUs 1 http://www.bigdata.com/blog

Upload: lytram

Post on 13-Apr-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP,  LLC  

Graphs  Graph  Databases  

Graph  Analy7cs  on  GPUs  

1 http://www.bigdata.com/blog

Page 2: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Graph  Database  •  High  performance,  Scalable  

–  50B  edges/node  –  High  level  query  language  –  Efficient  Graph  Traversal  –  High  9s  solu7on  

•  Open  Source  (Subscrip7ons)  –  Autodesk,  EMC,  market  data,  

genomics  and  personalized  medicine,  etc.  

GPU  Analy2cs  •  Extreme  Performance  

–  5-­‐100x  faster  than  graphlab  –  10,000x  faster  than  graphdbs  

•  DARPA  funding  •  Disrup7ve  technology  

–  Early  adopters  –  Huge  ROIs  

•  Open  Source  

SYSTAP™, LLC © 2006-2013 All Rights Reserved

2 http://www.bigdata.com/blog

 

Small  Business,  Founded  2006                          100%  Employee  Owned  

SYSTAP,  LLC  

Page 3: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Related  “Graph”  Technologies  

SYSTAP™, LLC © 2006-2013 All Rights Reserved

3 http://www.bigdata.com/blog

Redpoint repositions existing technology. MPGraph compares favorably with high end hardware solutions from YARC, Oracle, and SAP, but is open source and uses commodity hardware.

Pair  up  bigdata  and  MPGraph  

STTR  

Page 4: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Similar  models,  different  problems  •  Graph  query  and  graph  analy7cs  (traversal/mining)  

–  Related  data  models  –  Very  different  computa7onal  requirements  

•  Many  technologies  are  a  bad  match  or  limited  solu7on  –  Key-­‐value  stores  (bigtable,  Accumulo,  Cassandra,  HBase)  –  Map-­‐reduce  

•  An7-­‐pabern  –  Dump  all  data  into  “big  bucket”  

   

SYSTAP™, LLC © 2006-2013 All Rights Reserved

4 http://www.bigdata.com/blog

Page 5: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Similar  models,  different  problems  •  Graph  query  and  graph  analy7cs  (traversal/mining)  

–  Related  data  models  –  Very  different  computa7onal  requirements  

•  Many  technologies  are  a  bad  match  or  limited  solu7on  –  Key-­‐value  stores  (bigtable,  Accumulo,  Cassandra,  HBase)  –  Map-­‐reduce  

•  An7-­‐pabern  –  Dump  all  data  into  “big  bucket”  

Storage  and  computa2on  pa:erns  must  be  correctly  matched  for  high  performance.    

 SYSTAP™, LLC

© 2006-2013 All Rights Reserved

5 http://www.bigdata.com/blog

Page 6: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Op7mize  for  the  right  problem  •  Graph  Query  

–  Declara7ve  Query  Language  (SPARQL)  •  Query  op7miza7on  is  cri7cal  for  performance.  

–  Index  Locality  (1D  par77oning,  mul7ple  indices)  •  Get  everything  about  a  subject  on  one  page  of  the  index.  

–  Scale-­‐out  must  flow  queries  over  the  data  •  Otherwise  slams  the  network  and  the  client  

–  Must  order  and  constrain  joins  to  read  as  lible  data  as  possible  •  As-­‐bound  vectored  nested  index  joins  (bigdata)  •  Sideways  informa7on  passing  and  merge  joins  (RDF3X)  

SYSTAP™, LLC © 2006-2014 All Rights Reserved

6 http://www.bigdata.com/blog

Page 7: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Vectored  Query  in  Scale-­‐Out  

SYSTAP™, LLC © 2006-2012 All Rights Reserved

7 http://www.bigdata.com/blog

Page 8: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Op7mize  for  the  right  problem  •  Storage  and  computa7on  paberns  must  be  correctly  matched  

for  high  performance.    •  Graph  analy7cs:    

–  Parallelism  –  work  must  be  distributed  and  balanced.  –  Memory  bandwidth  –  memory,  not  disk,  is  the  bobleneck  –  2D  par77oning  –  O(N)  communica7ons  pabern  (versus  O(N*N))  

SYSTAP™, LLC © 2006-2014 All Rights Reserved

8 http://www.bigdata.com/blog

BFS   PR  

Page 9: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Accelerated  Graph  Analy7cs  

SYSTAP™, LLC © 2006-2013 All Rights Reserved

9 http://www.bigdata.com/blog

Page 10: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

10 http://www.bigdata.com/blog

The  Seman7c  Web  •  The  Seman7c  Web  is  a  stack  of  standards  developed  by  the  

W3C  for  the  interchange  and  query  of  metadata  and  graph  structured  data.  –  Open  data  –  Linked  data  –  Mul7ple  sources  of  authority  –  Self-­‐describing  data  and  rules  –  Federa7on  or  aggrega7on  –  And,  increasingly,  provenance  

Page 11: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

11 http://www.bigdata.com/blog

The  Standards  or  the  Data?  

TBL, 2000. S. Bratt, 2006.

Page 12: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

12 http://www.bigdata.com/blog

The  data  –  it’s  about  the  data  

Page 13: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

13 http://www.bigdata.com/blog

The  killer  “big  data”  app  •  Clouds  +  “Open”  Data  =  Big  Data  Integra7on  •  Cri7cal  advantages  

–  Fast  integra7on  cycle  –  Open  standards  –  Integrate  heterogeneous  data,  linked  data,  

structured  data,  data  at  rest,  and  streams.  –  Maintain  fine-­‐grained  provenance  of  

federated  data.  –  Opportunis7c  exploita7on  of  data  

–  Fragmented  informa7on  –  Dynamic  informa7on  –  Latent  Informa7on  (graph  mining)  

Page 14: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Unifying  Architecture  (example)  

SYSTAP™, LLC © 2006-2012 All Rights Reserved

14 http://www.bigdata.com/blog

Unified  Data  Model  

Streams

Unstructured

Semi-structured

Structured

Heterogeneous Data Sources as Input

Federate

Aggregate

Resource Centric (Linked Data)

Discover

Unified Compute and Storage Model

Data Bus

Update Query Database (SSD)

- Business Logic - Web Clients - Peer Systems

Graph Mining (GPUs)

- Aggregated –or– federated - High-level Query (SPARQL)

- Graph traversal / mining - “Think like a vertex”

Data Cache

- Key Value Stores

Page 15: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Road  Map  •  Column-­‐wise    

–  Faster  load  and  query.  –  Increased  data  density  and  scaling.  –  Integra7on  point  with  GPU  (shared  data).  

•  Mul7-­‐node  GPU  –  2D  decomposi7on  (DARPA  STTR)  

•  Performance  op7miza7on  for  scale-­‐out  –  Reducing  latency  and  increasing  throughput  –  Integra7on  point  for  SPARQL  accelera7on  and  2D  GPU  cluster.  

•  SPARQL  on  GPU  –  Query  at  3  billion  edges/second  –  Same  underlying  library,  but  horizontal  scaling  is  NOT  2D.  

SYSTAP™, LLC © 2006-2013 All Rights Reserved

15 http://www.bigdata.com/blog

Page 16: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Customers  and  Use  Cases  

SYSTAP™, LLC © 2006-2012 All Rights Reserved

16 http://www.bigdata.com/blog

Page 17: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

17 © Copyright 2012 EMC Corporation. All rights reserved.

SYSTAP™, LLC © 2006-2012 All Rights Reserved

17 http://www.bigdata.com/blog

Manufacturing Product Data is Heterogeneous

…and difficult to find, re-use, and share

Autodesk PLM360

Page 18: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Hadoop  /  bigdata®  pipeline  

SYSTAP™, LLC © 2006-2013 All Rights Reserved

18 http://www.bigdata.com/blog

Inference Cloud

HA Query Cloud

Durable Queue

Map/Reduce  Layer  

Durable Queue

+/- stmts

+/- stmts

BD Journals on PFS / HDFS

Linear scaling on query throughput

Scalable inference workload

Can be used for custom quads-mode inference strategies

Page 19: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

19 http://www.bigdata.com/blog

Federated  Query  &  Custom  Services  

•  Language  extension  for  remote  services    SERVICE  uri  {  graph-­‐pa1ern  }  

•  Integrated  into  the  vectored  query  engine  –  Solu7ons  vectored  into,  and  out  of,  remote  end  points.  –  Control  evalua7on  order  query  hints:  

•  runFirst,  runLast,  runOnce,  etc.    

•  ServiceRegistry  –  Configure  service  end  point  behavior  –  Embed  custom  “services”  (custom  indices,  monitor  transac7ons,  etc).    

SERVICE { …. } hint:Prior hint:runFirst “true”.

Page 20: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

High  Availability  •  Shared  nothing  architecture  

–  Same  data  on  each  node  –  Coordinate  only  at  commit  

•  Scaling  –  50  billion  triples  or  quads  –  Query  throughput  scales  linearly  

•  Self  healing  –  Automa7c  failover  –  Automa7c  resync  auer  disconnect  –  Online  single  node  disaster  recovery  

•  Online  Backup  –  Online  snapshots  (full  backups)  –  HA  Logs  (incremental  backups)  

•  Point  in  7me  recovery  (offline)  

SYSTAP™, LLC © 2006-2013 All Rights Reserved

20 http://www.bigdata.com/blog

HAService  

Quorum  k=3  

size=3  

follower  

leader  

HAService  

HAService  

Page 21: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

EMC  ProSphere  •  Host-­‐to-­‐storage  management  solu7on  

–  No  single  model.  Informa7on  must  be  combined  from  many  device  vendors.    RDF  is  a  natural  solu7on.  

–  Applica7on  deployed  as  appliance  into  data  centers  everywhere.  •  Bundles  the  bigdata  plaworm.  

–  220+  Engineers  in  US  and  India.  •  SYSTAP    

–  provides  support,  custom  services,  feature  development,  training.  

21 http://www.bigdata.com/blog

Page 22: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

22 © Copyright 2012 EMC Corporation. All rights reserved.

bigdata® usage: EMC ProSphere

bigdata RDF store

(journal)

SAIL/bigdata API

Topology Service

Topology Service JVM

REST API

SAN topology view

Maps Service JVM RDF/XML

Temp store REST API

Maps Service

GraphML

Flex UI

Page 23: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

23 http://www.bigdata.com/blog

Knowledge  Base  of  Biology  (KaBOB)  

Open  Biomedical  Ontologies  

biomedical    data  &  

informa7on  

applica7on  data  

biomedical  knowledge  

Entrez  Gene  

17  databases  

DIP  

UniProt  

GOA  

GAD  

HGNC  

InterPro  

Gene  Ontology  

Sequence  Ontology  

Cell  Type  Ontology   ChEBI   NCBI  

Taxonomy  Protein  Ontology  

12  ontologies  

… …

Page 24: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

24 http://www.bigdata.com/blog

bigdata®  scale-­‐out  federa7on  

Distributed Index Management and Query

RDF Data and SPARQL Query Managem

ent Functions Client Service

Registrar

Data Service

Client Service

Client Service

Data Service Data Service Data Service

Data Service Data Service Data Service

Zookeeper

Shard Locator

Transaction Mgr

Load Balancer

Unified API

Application Client

Application Client

Application Client

Application Client

Application Client

Client Service

SPARQL XML

SPARQL JSON

RDF/XML

N-Triples

N-Quads

Turtle

TriG

RDF/JSON

Page 25: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Kepler  GK110  Die  Photo  

25 http://www.bigdata.com/blog

•  Most  complex  commercial  IC  –  7.1  billion  transistors.  –  3x  gain  in  power  efficiency.  –  2,496  CUDA  cores.  –  1.5  MB  L2  Cache.  

Page 26: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Graph  Processing  

GPUs  Graphs  and  

Graph  Data  Mining  

26 http://www.bigdata.com/blog

Page 27: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

GPU  Graph  Processing  •  Mo7va7on  –  speed  

–  3  out  of  the  top  5  super  computers  are  GPU  clusters  –  3.3  B  traversed  edges  per  second  (one  GPU  :  Merrill,  2011)  –  8.3  B  traversed  edges  per  second  (quad  GPU  configura7on  :  ibid)  

•  Goal  –  Blindingly  fast  SPARQL  QUERY  and  graph  data  mining  on  GPU  clusters  –  20  minutes  on  Accumulo  =>  27  milliseconds  on  a  GPU.  

•  Open  source  –  Deploy  in  worksta7ons,  HPC  clusters,  EC2,  or  your  own  data  center  

27 http://www.bigdata.com/blog

Page 28: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

● ●

● ●

●● ●

● ●●●● ● ●

● ●

●● ●

●●

●●

●●●

●●

101

102

103

2002 2004 2006 2008 2010 2012Date

GFL

OPS

Precision● SP

DP

Vendor●

AMD (GPU)

NVIDIA (GPU)

Intel (CPU)

Intel Xeon Phi

Historical Single−/Double−Precision Peak Compute Rates

Latest Top500

• Why?

• FLOPS/m2

• FLOPS/$

• FLOPS/W

Many  Core  is  the  Future  •  Top  500  Super  Computer  Sites:  

–  3  out  of  top  5  are  GPU  clusters  (11/2011)  –  #1  and  #8  (11/2012)  

•  CPU  Clock  Rates  are  stagnant.  •  Simple  compute  units  +  parallelism  

=>  Increased  performance.  

28

Page 29: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

29

GPUs  –  A  Game  Changer  for  Graph  Analy7cs?  

0 500

1000 1500 2000 2500 3000 3500

1 10 100 1000 10000 100000

Millio

n Tr

aver

sed

Edge

s per

Se

cond

Average Traversal Depth

NVIDIA Tesla C2050 Multicore per socket Sequential

•  Graphs  are  everywhere  in  data,  also  a  powerful  data  model  for  federa7on  

•  GPUs  may  be  the  technology  that  finally  delivers  real-­‐7me  analy7cs  on  large  graphs  

•  10x  speedup  over  CPU  •  10x  DRAM  bandwidth  

•  This  is  a  hard  problem  •  Data  dependent  parallelism  •  Non-­‐locality  •  PCIe  bus  is  bobleneck  

•  Significant  speed  up  over  CPU  on  BFS  •  3  billion  edges  per  second  on  one  GPU  

(see  chart).  •  Roadmap  

•  GPU  accelerated  vertex-­‐centric  graph  mining  plaworm.  

•  GPU  accelerated  graph  query  

0  

1   1  2  

1  

1  

2  

2  2  

2  

1  

3  

2  

3  

2  

1   2  

2  

Breadth-First Search on Graphs 10x Speedup on GPUs

Page 30: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

GAS  –  a  Graph-­‐Parallel  Abstrac7on  

•  Graph-­‐Parallel  Vertex-­‐Centric  API  ala  GraphLab  •  “Think  like  a  vertex”  

• Gather:  collect  informa7on      about  my  neighborhood  

• Apply:  update  my  value  

• Scaber:  signal  adjacent  ver7ces  •  Can  write  all  sorts  of  graph  algorithms  this  way  

–  BFS,  PageRank,  Connected  Component,  Triangle  Coun7ng,  Max  Flow,  etc.  

Page 31: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

34.8  Billion  Triangles  Triangle  Coun2ng  on  Twiber  Graph  

64 Machines 15 Seconds

1636 Machines 423 Minutes

Hadoop [WWW’11]

S. Suri and S. Vassilvitskii, “Counting triangles and the curse of the last reducer,” WWW’11

Why?  Wrong  Abstrac2on    à                Broadcast  O(degree2)  messages  per  Vertex  

Page 32: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

GPU  Speedups  vs  GraphLab  (SSSP)  

SYSTAP™, LLC © 2006-2014 All Rights Reserved

32 http://www.bigdata.com/blog

Page 33: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Graph  Mining  on  GPU  Clusters  

SYSTAP™, LLC © 2006-2012 All Rights Reserved

33 http://www.bigdata.com/blog

•  2D  par77oning  (aka  vertex  cuts)  •  Minimizes  the  communica7on  volume.  •  Batch  parallel  Gather  in  row,  Scaber  in  

column.  

Page 34: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

34 http://www.bigdata.com/blog

RDF  Database  

Overview  New  features  

Integra7on  Points  REST  API  

Page 35: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

35 http://www.bigdata.com/blog

Bigdata  1.3  •  Fast,  scalable,  open  source,  standards  compliant  database  

-­‐  Single  machine  to  50B  triples  or  quads  -­‐  Plus  a  dedicated  provenance  mode.  

-­‐  Scales  horizontally  on  a  cluster  -­‐  SPARQL  1.1  Query,  Property  Paths,  Update,  Federated  Query,  etc.  -­‐  Na7ve  RDFS+  inference.  -­‐  Vectored  query  engine.  

-­‐  High  Availability  -­‐  RDF  Graph  Mining  

Page 36: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

36 http://www.bigdata.com/blog

Sesame  Sail  &  Repository  APIs  •  Java  APIs  for  managing  and  querying  RDF  data  •  Extension  methods  for  bigdata:  

–  Non-­‐blocking  readers  –  RDFS+  truth  maintenance  –  Seman7c  /  graph  search  –  Change  history  –  Custom  services  –  Etc.  

Page 37: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

37 http://www.bigdata.com/blog

NanoSparqlServer  •  Easy  deployment  

–  Embedded  or  Servlet  Container  –  Java  client  encapsulates  remote  opera7ons.  

•  High  performance  SPARQL  end  point  –  Op7mized  for  bigdata  MVCC  seman7cs  (queries  are  non-­‐blocking)  –  Built  in  resource  management  –  Scalable!  

•  Simple  REST  API  –  SPARQL  1.1  Query  and  Update  –  Simple  and  useful  REST-­‐ful  INSERT/UPDATE/DELETE  methods  –  ESTCARD  exposes  super  fast  range  counts  for  triple  paberns  –  “Explain”  a  query.  –  Monitor  /  cancel  running  queries.  

Page 38: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

38 http://www.bigdata.com/blog

SPARQL  Query  •  High  level  query  language  for  graph  data.  

–  Standard  (W3C)  

•  Based  on  graph  pabern  matching.  –  SELECT  vars  WHERE  graph-­‐pa1ern  

•  Returns  result  set  (table)  –  CONSTRUCT  template  WHERE  graph-­‐pa1ern  

•  Returns  sub-­‐graph  –  DESCRIBE  uri  

•  Returns  graph  for  that  object.  

•  Database  can  op7mize  physical  storage  and  joins  –  Versus  naviga7on-­‐only  APIs  such  as  blueprints  

SELECT ?x ?z!WHERE {! ?x foaf:knows ?y .! ?y foaf:knows ?z .!}  

Page 39: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

39 http://www.bigdata.com/blog

BSBM  100M  (Single  Server)  •  Graph  shows  series  of  trials  

for  the  BSBM  reduced  query  mix  (w/o  Q5).  

•  Metric  is  Query  Mixes  per  Hour  (QMpH).  Higher  is  beber.  

•  8  client  curve  shows  JVM  and  disk  warm  up  effects.    Both  are  hot  for  16  client  curve.  

•  Occasional  low  points  are  GC.  •  Apple  mini    (4  cores,  16G  

RAM  and  SSD).    Machine  is  CPU  bound  at  16  clients.  No  IO  Wait.  

BSBM 100M

0

10,000

20,000

30,000

40,000

50,000

60,000

1 11 21 31 41 51

trials

QM

pH

8 clients

16 clients

Page 40: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

40 http://www.bigdata.com/blog

JVM  Heap  Pressure  JVMs provide fast evaluation (rivaling hand-coded C++) through sophisticated online compilation and auto-tuning.

Applica2on  Throughput  

GC  Workload  Applica2on  Workload  è  

JVM  Resou

rces  è

However, a non-linear interaction between the application workload (object creation and retention rate), and GC running time and cycle time can steal cycles and cause application throughput to plummet.

Page 41: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

41 http://www.bigdata.com/blog

Choose  standard  or  analy7c  operators  •  Easy  to  specify  which  

–  URL  query  parameter  or  SPARQL  query  hint  

•  Java  operators  –  Use  the  managed  Java  heap.    –  Can  some7mes  be  faster  or  offer  beber  concurrency  

•  E.g.,  dis7nct  solu7ons  is  based  on  a  concurrent  hash  map  

–  BUT  •  The  Java  heap  can  not  handle  very  large  materialized  data  sets.  •  GC  overhead  can  steal  your  computer  

•  Analy7c  operators  –  Scale  up  gracefully  –  Zero  GC  overhead.  

Page 42: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

42 http://www.bigdata.com/blog

SPARQL  Update  •  Graph  Management  

–  Create,  Add,  Copy,  Move,  Clear,  Drop  

•  Graph  Data  Opera7ons  –  LOAD  uri  –  INSERT  DATA,  DELETE  DATA  –  DELETE/INSERT  

•  Can  be  used  as  a  RULES  language,  update  procedures,  etc.  

( WITH IRIref )? ( ( DeleteClause InsertClause? ) | InsertClause ) ( USING ( NAMED )? IRIref )* WHERE GroupGraphPattern

Page 43: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

43 http://www.bigdata.com/blog

SPARQL  UPDATE  Extension  •  Language  extension  for  SPARQL  UPDATE  

–  Adds  durable  solu7on  sets.  

•  Easy  to  slice  result  sets:  

•  Re-­‐group  or  re-­‐order  results:  

•  Expensive  JOINs  are  NOT  recomputed.  

INSERT INTO %solutionSet1 SELECT ?product ?reviewer WHERE { … }

SELECT ... { INCLUDE %solutionSet1 } OFFSET 0 LIMIT 1000 SELECT ... { INCLUDE %solutionSet1 } OFFSET 1000 LIMIT 1000 SELECT ... { INCLUDE %solutionSet1 } OFFSET 2000 LIMIT 1000

SELECT ... { INCLUDE %solutionSet1 } GROUP BY ?x

SELECT ... { INCLUDE %solutionSet1 } ORDER BY ASC(?x)

Page 44: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Seman7c  Search  •  Enable  in  your  KB  proper7es:    

–  com.bigdata.rdf.store.AbstractTripleStore.textIndex=true.  

•  Simple  full  text  search:   prefix bd: <http://www.bigdata.com/rdf/search#>! select ?s, ?o {!

!?o bd:search “mike” . # all literals with “mike” token.!!?s ?p ?o . # all subjects for those literals.!!}!

•  Lots  of  op7ons  (cosine  relevance,  rank,  match  all  terms,  etc.).  •  Low  latency,  web  facing  applica7ons  built  by  “slicing”  the  

search  index.  •  See  hbps://sourceforge.net/apps/mediawiki/bigdata/index.php?7tle=FullTextSearch  

SYSTAP™, LLC © 2006-2013 All Rights Reserved

44 http://www.bigdata.com/blog

Page 45: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

45 http://www.bigdata.com/blog

Federated  Query  &  Custom  Services  

•  Language  extension  for  remote  services    SERVICE  uri  {  graph-­‐pa1ern  }  

•  Integrated  into  the  vectored  query  engine  –  Solu7ons  vectored  into,  and  out  of,  remote  end  points.  –  Control  evalua7on  order  query  hints:  

•  runFirst,  runLast,  runOnce,  etc.    

•  ServiceRegistry  –  Configure  service  end  point  behavior  –  Embed  custom  “services”  (custom  indices,  monitor  transac7ons,  etc).    

SERVICE { …. } hint:Prior hint:runFirst “true”.

Page 46: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

RDF  Graph  Mining  •  “GAS”  SERVICE  –  2x  faster  than  neo4j,  4x  faster  than  7tan.  •  Handles  typed  linked  and  link  abributes  efficiently    PREFIX  gas:  <hbp://www.bigdata.com/rdf/gas#>  SELECT  ?depth  (count(?out)  as  ?cnt)  {      SERVICE  gas:service  {            gas:program  gas:gasClass  "com.bigdata.rdf.graph.analy7cs.BFS"  .            gas:program  gas:in  <ip:/112.174.24.90>  .  #  one  or  more  7mes,  specifies  the  ini7al  fron7er.            gas:program  gas:out  ?out  .  #  exactly  once  -­‐  will  be  bound  to  the  visited  ver7ces.            gas:program  gas:out1  ?depth  .  #  exactly  once  -­‐  will  be  bound  to  the  depth  of  the  visited  ver7ces.            gas:program  gas:maxItera7ons  4  .  #  op7onal  limit  on  breadth  first  expansion.            gas:program  gas:maxVisited  2000  .  #  op7onal  limit  on  the  #of  visited  ver7ces.      }  }    group  by  ?depth  order  by  ?depth  

SYSTAP™, LLC © 2006-2014 All Rights Reserved

46 http://www.bigdata.com/blog

 depth      count      1      207      2      1,985      3      9,861      4      29,366    

SPARQL  Query  

Page 47: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

47 http://www.bigdata.com/blog

Highly  Available  Replica7on  Cluster  

(HAJournalServer)    

(bigdata  1.3  release)  

Page 48: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

HA  Replica7on  Cluster  •  Same  backend  database  (Journal  +  RWStore)  

–  Same  data  scale  as  Journal.  –  Same  IO  profile  as  Journal.  –  Same  low  latency  query  profile  as  Journal.  

•  REST  API  is  mandatory  –  SPARQL  Query  –  SPARQL  UPDATE  –  SPARQL  Federated  Query  

•  Can  be  used  to  cross  tenant  or  machine  boundaries.  •  Writes  on  leader  

–  Iden7fied  in  zookeeper  or  using  REST  API.  •  Query  on  leader  or  followers  

–  Each  query  is  answered  100%  locally.  •  Zero  coordina7on  overhead.  •  Linear  scaling  in  query  throughput    

SYSTAP™, LLC © 2006-2013 All Rights Reserved

48 http://www.bigdata.com/blog

Page 49: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

High  Availability  •  Shared  nothing  architecture  

–  Same  data  on  each  node  –  Coordinate  only  at  commit  

•  Scaling  –  50  billion  triples  or  quads  –  Query  throughput  scales  linearly  

•  Self  healing  –  Automa7c  failover  –  Automa7c  resync  auer  disconnect  –  Online  single  node  disaster  recovery  

•  Online  Backup  –  Online  snapshots  (full  backups)  –  HA  Logs  (incremental  backups)  

•  Point  in  7me  recovery  (offline)  

SYSTAP™, LLC © 2006-2014 All Rights Reserved

49 http://www.bigdata.com/blog

HAService  

Quorum  k=3  

size=3  

follower  

leader  

HAService  

HAService  

Page 50: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

High  Availability  

50 http://www.bigdata.com/blog

HAService  

Quorum  k=3  

size=3  

follower  

leader  

HAService  

HAService  Clients   • Write  on  the  leader.    Read  on  any  service.  • Writes  replicated  using  low-­‐level  transfers.  • Quorum  fully  consistent  at  each  commit.  

write  

Page 51: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

High  Availability  

51 http://www.bigdata.com/blog

   read  

   read  

   read  HAService  

Quorum  k=3  

size=3  

follower  

leader  

HAService  

HAService  Clients   • Write  on  the  leader.    Read  on  any  service.  • Writes  replicated  using  low-­‐level  transfers.  • Quorum  fully  consistent  at  each  commit.  

Page 52: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Self-­‐Healing  

HAService  

Quorum  k=3  

size=2  

follower  

leader  

HAService  

HAService  

join  

synchronize  

Service  can  fail  for  a  variety  of  reasons:  •  JVM  down  •  Machine  down  •  Network  par77on  •  Zookeeper  7meout  •  Discovery  failure  •  Wrong  commit  point  •  Severe  clock  skew  

(delta)  

•  Goal  is  to  guarantee  eventual  consistency  without  allowing  intermediate  illegal  states.  

•  Persistent  state  of  the  service  must  remain  self-­‐consistent  

Page 53: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

HA  Load  Balancer  

SYSTAP™, LLC © 2006-2014 All Rights Reserved

53 http://www.bigdata.com/blog

HAService  leader  

proxy  

HAService  

HAService  

• Transparent  Proxy  • Writes  proxied  to  the  leader.  • Reads  load  balanced  over  quorum.  

•   Host  metrics  (CPU,  IO  Wait)  •   Service  metrics  (GC  Time)  

• Custom  policies  •   Tenant  aware  •   Low  &  high  latency  pools   Jeby  container  

Load  balancer  

proxy  

request  

Page 54: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

HA  Deployment  

SYSTAP™, LLC © 2006-2014 All Rights Reserved

54 http://www.bigdata.com/blog

• EC2  •   SSD  instance  types  (IOPS,  ephemeral)  •   Snapshots  and  logs  on  EBS  (durable)  •   Restore  from  EBS  on  instance  restart  •   Coming  soon  

• Click  start  HA  clusters  on  EC2  • Private  clouds  

•   OpenStack  •   chef/puppet  

HAService  

HAService  

HAService  

Page 55: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2014 All Rights Reserved

55 http://www.bigdata.com/blog

HA  Service  Architecture  

REST API (SPARQL, GAS, etc.) Ensem

ble

Zookeeper

Unified API

Application Client

Application Client

Application Client

Application Client

Application Client

SPARQL XML

SPARQL JSON

RDF/XML

N-Triples

N-Quads

Turtle

TriG

RDF/JSON

Zookeeper

Zookeeper

ServiceStarter

HAJournalServer

Lookup Service Class Server

jetty

REST API Load Balancer

HAGlue API Journal

ServiceStarter

HAJournalServer

Lookup Service Class Server

jetty

REST API Load Balancer

HAGlue API Journal

ServiceStarter

HAJournalServer

Lookup Service Class Server

jetty

REST API Load Balancer

HAGlue API Journal

HA R

eplication Cluster

Page 56: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

BSBM  100M  (3-­‐Node  HA  Cluster)  

56 http://www.bigdata.com/blog

0"

20000"

40000"

60000"

80000"

100000"

120000"

140000"

160000"

1" 11" 21" 31" 41" 51" 61" 71" 81" 91" 101" 111"

QMpH

%

Query%Performance%Scales%Linearly%with%Cluster%Size%BSBM%100M,%3@node%replicaBon%cluster%using%Intel%Mac%Minis%•  3-­‐Node,  Shared-­‐Nothing  

Replica7on  Cluster  –  3x  2011  Mac  Mini    (4  

cores,  16G  RAM  and  SSD).      

•  Query  Scales  Linearly  •  CPU  bound  

–  70-­‐90k  QMpH  on  newer  servers.  

Aggregate Throughput

Per-node Throughput

Page 57: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

HA  DEMO  •  Brief  demonstra7on  of  an  HA-­‐3  cluster  •  Start  3  services  (A,B,C)  •  Load  some  data  (TBL  foaf  crawl)  

DROP  ALL;  LOAD  <file:/Users/bryan/Documents/workspace/BIGDATA_RELEASE_1_2_0/bigdata-­‐rdf/src/resources/data/foaf/data-­‐0.nq.gz>;  LOAD  <file:/Users/bryan/Documents/workspace/BIGDATA_RELEASE_1_2_0/bigdata-­‐rdf/src/resources/data/foaf/data-­‐1.nq.gz>;  LOAD  <file:/Users/bryan/Documents/workspace/BIGDATA_RELEASE_1_2_0/bigdata-­‐rdf/src/resources/data/foaf/data-­‐2.nq.gz>;  

–  Iden7cal  data  on  all  services  SELECT  (count(*)  as  ?c)  {  ?s  ?p  ?o  }  LIMIT  1  

•  Self-­‐healing  –  Shutdown  C  –  Load  more  data  on  A  +  B.  Commit.  LOAD  <file:/Users/bryan/Documents/workspace/BIGDATA_RELEASE_1_2_0/bigdata-­‐rdf/src/resources/data/foaf/data-­‐3.nq.gz>;  

–  Restart  C.  Resyncs  and  joins  quorum.  •  Snapshot  service.  

File   Quads   Size  (gz)  7mbl/data-­‐0.nq.gz   89   2.5K  7mbl/data-­‐1.nq.gz   16516   293K  7mbl/data-­‐2.nq.gz   87250   1.2M  7mbl/data-­‐3.nq.gz   388412   5.1M  7mbl/data-­‐4.nq.gz   9405528   113M  7mbl/data-­‐5.nq.gz   93898523   1016M  7mbl/data-­‐6.nq.gz   101010423   1.2G  

Page 58: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

58 http://www.bigdata.com/blog

Bigdata®  

Services  and  dynamics  

Page 59: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Related  “Graph”  Technologies  

SYSTAP™, LLC © 2006-2013 All Rights Reserved

59 http://www.bigdata.com/blog

Redpoint repositions existing technology. MPGraph compares favorably with high end hardware solutions from YARC, Oracle, and SAP, but is open source and uses commodity hardware.

Pair  up  bigdata  and  MPGraph  

STTR  

Page 60: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

60 http://www.bigdata.com/blog

bigdata®  is  a  federa7on  of  services  

Distributed Index Management and Query

RDF Data and SPARQL Query Managem

ent Functions Client Service

Registrar

Data Service

Client Service

Client Service

Data Service Data Service Data Service

Data Service Data Service Data Service

Zookeeper

Shard Locator

Transaction Mgr

Load Balancer

Unified API

Application Client

Application Client

Application Client

Application Client

Application Client

Client Service

SPARQL XML

SPARQL JSON

RDF/XML

N-Triples

N-Quads

Turtle

TriG

RDF/JSON

Page 61: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

61 http://www.bigdata.com/blog

Typical  Souware  Stack  

Unified API Unified API Implementation

API Frameworks (Spring, etc.) Sesame Framework

SAIL RDF SPARQL

Bigdata RDF Database Bigdata Component Services

OS (Linux)

Java

Application Layer

HTT

P Ji

ni

Zook

eepe

r

Cluster and Storage Management

Page 62: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

62 http://www.bigdata.com/blog

Service  Discovery  (1/2)  •  Services  discover  registrars.    Registrar  discovery  is  configurable  using  either  unicast  or  mul7cast  protocols.  

•  Services  adver7se  themselves  and  lookup  other  services  (a).  

•  Clients  use  the  shard  locator  to  locate  key-­‐range  shards  for  scale-­‐out  indices  (b).  

(b)

Client Service

Shard Locator

Data Service

Registrar

Data Service

Data Service

Service registration and discovery (a)

RMI / NIO Bus

(a)

Page 63: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

63 http://www.bigdata.com/blog

Service  Discovery  (2/2)  •  Clients  resolve  shard  locators  to  data  service  iden7fiers  (a),  then  lookup  the  data  services  in  service  registrar  (b).  

•  Data  moves  directly  between  clients  and  services  (c).  

•  Service  protocols  not  limited  to  RMI.  Custom  NIO  protocols  for  data  high  throughput.  

•  Client  libraries  encapsulate  this  for  applica7ons,  including  caching  of  service  lookup  and  shard  resolu7on.  

(c)

(c)

(c)

(a)

Client Service

RMI / NIO Bus

Shard Locator

Data Service

Registrar

Data Service

Data Service

(b)

Service lookup

Control Messages and

Data Flow

Page 64: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

64 http://www.bigdata.com/blog

Persistence  Store  Mechanisms  •  Read/Write  (RW)  store  

–  Efficiently  recycles  alloca7on  slots  on  the  backing  file  –  Used  by  services  that  need  persistence  without  sharding,  HA,  etc.  –  Also  used  in  the  scale-­‐up  single  machine  database  (~50B  triples)  

•  Write  Once,  Read  Many  (WORM)  store  –  Append-­‐only,  log  structured  store  (aka  “journal”)  –  Used  by  the  data  services  used  to  absorb  writes  –  Plays  an  important  role  in  the  scale-­‐out  database  architecture  

•  Index  segment  (SEG)  store  –  Read-­‐op7mized  B+Tree  files  –  Generated  by  bulk  index  build  opera7ons  on  a  data  service  –  Plays  an  important  role  in  dynamic  sharding  and  analy7c  query  

RWStore  Journal  

WORM  Journal  

index    segments  

Page 65: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

65 http://www.bigdata.com/blog

The  Data  Service  (1/2)  

Data Services

overflow

Scattered writes

Gathered reads

Clients Append only journals and read-optimized index segments are basic building blocks.

index    segments  

journal  

Page 66: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

66 http://www.bigdata.com/blog

The  Data  Service  (2/2)  

Data  Service  

index    segments  

Journal  • Block  append  for  ultra  fast  writes  • Target  size  on  disk  ~  200MB.  

Index  Segments  • 98%+  of  all  data  on  disk  • Target  shard  size  on  disk  ~200M  • Bloom  filters  (fast  rejec7on)  • At  most  one  IO  per  leaf  access  • Mul7-­‐block  IO  for  leaf  scans    

journal  

Behind the scenes on a data service.

Overflow  • Fast  synchronous  overflow  • Asynchronous  index  builds  • Key  to  Dynamic  Sharding  

overflow

Page 67: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

67 http://www.bigdata.com/blog

Bigdata®  Indices  

Scale-­‐out  B+Tree  and  Dynamic  Sharding  

Page 68: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

68 http://www.bigdata.com/blog

Bigdata®  Indices  

•  Dynamically  key-­‐range  par77oned  B+Trees  for  indices  –  Index  entries  (tuples)  map  unsigned  byte[  ]  keys  to  byte[  ]  values.  –  “deleted” flag  and  7mestamp  used  for  MVCC  and  dynamic  sharding.  

•  Index  par77ons  distributed  across  data  services  on  a  cluster  –  Located  by  centralized  metadata  service  

nodes Csep. keys 7child refs. A B

A B4 9a b c d

leaves a b c dkeys 1 2 3 4 5 6 7 8 9 10values v1 v2 v3 v4 v5 v6 v7 v8 v9 v10

Page 69: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

69 http://www.bigdata.com/blog

Dynamic  Key  Range  Par77oning  

p0   split   p1   p2  

p2   join   p3  p1  

p3   move   p4  

Splits  break  down  the  shards  as  the  data  scale  increases.  

Moves  redistribute  shards  onto  exis7ng  or  new  nodes  in  the  cluster.  

Joins  merge  shards  when  data  scale  decreases.  

Page 70: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

70 http://www.bigdata.com/blog

Shard  Locator  Service  

p0  ([],∞)    

DataService1  

Ini7al  condi7ons  place  the  first  shard  on  an  arbitrary  data  service  represen7ng  the  en7re  index  key  range.  

Dynamic  Key  Range  Par77oning  

Page 71: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

71 http://www.bigdata.com/blog

SYSTAP™, LLC © 2006-2011 All Rights Reserved

Writes  cause  the  shard  to  grow.    Eventually  its  size  on  disk  exceeds  a  preconfigured  threshold.    

Dynamic  Key  Range  Par77oning  

p0  ([],∞)    

Shard  Locator  Service  

DataService1  

Page 72: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

72 http://www.bigdata.com/blog

p1  

p5  p2   p3  

p4   p6  

p7  

p8   p9  Shard  Locator  Service  

([],∞)    

DataService1  

Instead  of  a  simple  two-­‐way  split,  the  ini7al  shard  is  “scaber-­‐split”  so  that  all  data  services  can  start  managing  data.  

Dynamic  Key  Range  Par77oning  

Page 73: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

73 http://www.bigdata.com/blog

The  newly  created  shards  are  then  dispersed  around  the  cluster.    Subsequent  splits  are  two-­‐way  and  moves  occur  based  on  rela7ve  server  load  (decided  by  Load  Balancer  Service).  

DataService1  

DataService2  Shard  Locator  Service  

p1  

p2  

p9  

(1)    

(9)    

(2)    

(…)    

DataService9  

Dynamic  Key  Range  Par77oning  

Page 74: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

74 http://www.bigdata.com/blog

Shard  Evolu7on  

build   p0  

p2   merge   p3  

Builds  generate  index  segments  from  just  the  old  journal.  

Merge  compacts  the  shard  view  into  a  single  index  segment.  

journal0  

journal0  p0  p1  

build   p1  journal0  

build   p2  journal0  

p0  

p0  p1  

Page 75: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

75 http://www.bigdata.com/blog

Shard  Evolu7on  

•  Ini7al  journal  on  DS.  

•  Incremental  build  of  new  segments  for  a  shard  with  each  journal  overflow.  

•  Shard  periodically  undergoes  compac7ng  merge.  

•  Shard  will  split  at  200MB.  

READ

READ

journaln  seg  0  

seg  n-­‐1  

build WRITE

seg  n  

time

journaln+1  

tn

t0

tn+1

READ

WRITE

WRITE merge

journal0  

Page 76: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

76 http://www.bigdata.com/blog

Bulk  Data  Load  

High  Throughput  with  Dynamic  Sharding  

Page 77: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

77 http://www.bigdata.com/blog

Bulk  Data  Load  

•  Very  high  data  load  rates  –  1B  triples  in  under  an  hour  (beber  than  300,000  triples  per  second  on  a  16  node  cluster).  

•  Executed  as  a  distributed  job  –  Read  data  from  a  file  system,  the  web,  HDFS,  etc.  

•  Database  remains  available  for  query  during  load  –  Read  from  historical  commit  points  with  snapshot  isola7on.  

Page 78: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

78 http://www.bigdata.com/blog

Distributed  Bulk  Data  Loader  

Applica7on  client  iden7fies  RDF  data  files  to  be  loaded  

One  client  is  elected  as  the  job  master  and  coordinates  the  job  across  the  other  client.  

Writes  are  scabered  across  the  data  service  nodes  

Application Client

Client Service

Data Service Data Service Data Service Data Service Data Service

Client Service Client Service Client Service Clients  read  directly  from  shared  storage.  

Page 79: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

79 http://www.bigdata.com/blog

Input  (2.5B  triples)  16 node cluster loading at 180k triples per second

0.00

0.50

1.00

1.50

2.00

2.50

3.00

1 61 121 181

Billi

ons

minutes

told

Trip

les

Page 80: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

80 http://www.bigdata.com/blog

Throughput  (2.5B  triples)  16 node cluster, 2.5B triples.

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

1 61 121 181

minutes

tripl

es p

er s

econ

d

Page 81: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

81 http://www.bigdata.com/blog

Disk  U7liza7on  (~56  bytes/triple)  Bytes Under Management

0

20

40

60

80

100

120

140

160

180

200

1 61 121 181

Bill

ions

minutes

byte

s on

dis

k

Page 82: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

82 http://www.bigdata.com/blog

Shards  over  7me  (2.5B  triples)  Total Shard Count on Cluster over time for 2.5B triples

0

100

200

300

400

500

600

700

800

1 61 121 181

minutes

Shar

ds

Page 83: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

83 http://www.bigdata.com/blog

Dynamic  Sharding  in  Ac7on  Shard builds, merges, and splits for 2.5B triples

0

2000

4000

6000

8000

10000

12000

14000

16000

1 61 121 181minutes

#of o

pera

tions

SplitBuildMerge

Page 84: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

84 http://www.bigdata.com/blog

Scaber  Splits  in  Ac7on  (Zoom  on  first  3  minutes)  

Page 85: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Vectored  Query  in  Scale-­‐Out  

SYSTAP™, LLC © 2006-2012 All Rights Reserved

85 http://www.bigdata.com/blog

Page 86: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

bigdata  ®  

Bryan  Thompson  SYSTAP,  LLC  

[email protected]    

86 http://www.bigdata.com/blog

http://www.systap.com/bigdata.htm http://www.bigdata.com/blog

Page 87: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Backup  Material  

87 http://www.bigdata.com/blog

SYSTAP™, LLC © 2006-2014 All Rights Reserved

Page 88: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Statement  Iden7fiers  

RDF  Graphs  with  efficient  link  abributes  

SYSTAP™, LLC © 2006-2012 All Rights Reserved

88 http://www.bigdata.com/blog

Page 89: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

89 http://www.bigdata.com/blog

Statement  Level  Metadata  

•  Important  to  know  where  data  came  from  in  a  mashup  

•   :mike  :memberOf  :SYSTAP  .  •       dc:source  <hbp://www.systap.com>  .  

•  But  you  CAN  NOT  say  that  in  RDF.  

Page 90: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

90 http://www.bigdata.com/blog

RDF  “Reifica7on”  

•  Creates  a  “model”  of  the  statement:  _:s1 rdf:subject :mike . _:s1 rdf:predicate :memberOf . _:s1 rdf:object :SYSTAP . _:s1 rdf:type rdf:Statement .  

•  Then  you  can  say:  _:s1 dc:source <http://www.systap.com> .

Page 91: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

91 http://www.bigdata.com/blog

Statement  Iden7fier  Mode  (SIDs)  

•  Special  database  mode  –  SIDs  look  just  like  blank  nodes  –  Leverages  the  named  graph  of  the  statement  

•  SIDs  let  you  do  exactly  what  you  want:  :mike :memberOf :SYSTAP _:s1 . _:s1 dc:source <http://www.systap.com> .

•  Use  SIDs  in  SPARQL:  select ?s ?o ?source

where {

GRAPH ?sid { ?s :memberOf ?o } . ?sid dc:source ?source .

}

Page 92: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

SYSTAP™, LLC © 2006-2012 All Rights Reserved

92 http://www.bigdata.com/blog

Reifica7on  Done  Right  

•  Extends  the  concept  to  support  quads.  –  Outcome  from  Dagstuhl  2012  Seman7c  Data  Management  workshop.  

–  Collabora7ve  effort  with  SYSTAP,  Open  Link,  Humboldt  University,  Karlsruhe  Ins7tute  of  Technology.  

–  W3C  Member  Submission  in  prepara7on.  –  Harmonized  with  RDF  model  theory  &  SPARQL  algebra.  –  Efficient  in  index  structures  and  queries.  

•  Extensions  for  N3,  TURTLE,  and  SPARQL  are  proposed.  •  Interchange  and  query  for  link  abributes  (graph  databases).  

Page 93: SYSTAP,LLC’ - Semanticommunity.infosemanticommunity.info/@api/deki/files/28780/SYSTAP-bigdata-Xcelera... · SYSTAP,LLC’ Graphs’ Graph’Databases’ ... SAP, but is open source

bigdata® Presented at Xcelerate 3/18/2014

Works  with  triples  or  quads  •  Inline  statements  into  statements.  

<< :mike :memberOf :SYSTAP >>

dc:source <http://www.systap.com>

•  Same  syntax  works  for  query  select ?s ?o ?source

where { << ?s :memberOf ?o >> dc:source ?source .

•  Standardized  approach  for:  •  Link  abributes  (graph  databases).  •  Confidence  measures  (en7ty  /  link  extractors).  •  Datum  level  security  models.  

SYSTAP™, LLC © 2006-2012 All Rights Reserved

93 http://www.bigdata.com/blog