towards temporal graph management and analytics

13
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences Recent Updates on IBM System G — GraphBIG and Temporal Data Yinglong Xia IBM T.J. Watson Research Center Yorktown Heights, NY 10598

Upload: ldbc-the-graph-rdf-benchmark-reference

Post on 05-Aug-2015

128 views

Category:

Technology


1 download

TRANSCRIPT

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

Recent Updates on IBM System G — GraphBIG and Temporal Data

Yinglong Xia IBM T.J. Watson Research Center Yorktown Heights, NY 10598

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

2

IBM T.J. Watson Research Center

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

3

Using LDBC-SNB for GraphBIG

• GraphBIG = Graph Benchmark Suite from IBM System G and GaTech HPArch • A wide selection of workloads from both CPU and GPU • Workload ranging from graph traversal to Gibbs Sampling on Bayesian Network • Illustrating processor architecture impact using h/w performance counter

• Fix input data and implementation • Show performance profiling at processor architecture level

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

4

Beyond the Benchmarking for Graph DBs• Graph computing was barely considered in architecture design • Increasing motivation due to popularity of graph analytics • Impact of architecture requires fixed input data and analytic implementation

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

5

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

6

Demanding Graph • Interactions of entities in many big data applications are naturally modeled by property graphs • Evolution of graph structure and properties over time usually provides useful information, which needs

to be maintained for query or analytics • Graph analytics market grows increasingly fast as well as the graph data size and complexity, but

near real time response is typically required

Xiaoyan Fu, Seok-Hee Hong, Nikola S. Nikolov, Xiaobin Shen, Ying Xin Wu and Kai Xu, Visualization and Analysis of Email Networks, Proceedings of APVIS 2007, IEEE, pp.1-8, 2007.

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

7

Use Case: Forensic Analysis on Individual Status

• Recover the dynamics of individual status • Evaluate status measures, anomalies, etc. • Propagate known status measures • Estimate labels for each person at each time stamp • Aggregate the received measures

Chain Graph: A collection of graphs on contiguous time steps

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

8

Use Case: Bitemporal Data Exploration

• Support the valid dimension and the transaction dimension • Audit trail of what you know and when did you know • History of how history from a business perspective was stored in the

database

http://bitemporalmodeling.com/temporal-data-blog/

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

9

Graph Data ManagementSparkseeNeo4j

Titan

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

10

Organization of Graph Store

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

11

Organize Temporal Graph Data Name Default Value

vertex_history Disabled

num_vertex_property_bundles 0

edge_history Disabled

num_edge_property_bundles 0

… …

Flag (uint8)

inEdge(uint64)

inEdge Count (uint16)

outEdge(uint64)

outEdge Count (uint16)

Property(uint64)

Property Count (unit64)

History(unit64)

Vertex Record Table

inEdge List

Flag Property PropertyCount

History …

……

Prev Edge_list_buffer<EID,VID,LID>

…Edge Record Table

Accessed Vertex Record by VID * sizeof (VtxRec)

inEdgeCount * sizeof (<EID,VID,LID>) point to the buffer end

Accessed Edge Record by EID * sizeof (EdgeRec)

Vertex Property Table

Prev property_buffer

PropertyCountpoint to the buffer end

Edge Property Table

Prev property_buffer

Name Default Value

min_VID 0

max_VID

min_EID 0

Max_EID

… …

Local Configuration

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

12

Pointer Jumping in Temporal Graph Inference

• Converting a temporal graph into tridiagonal system • Forward Gaussian elimination by propagation • Backward substitution to produce solutions

• A Parallel Solution to Thomas Algorithm • Apply pointer jumping to Thomas algorithm • Logarithmic speedup

parallel solution to solve a tridiagonal linear system

• Converting a chain graph into tridiagonal system • Forward Gaussian elimination by propagation • Backward substitution to produce solutions

• A Parallel Solution to Thomas Algorithm • Apply pointer jumping to Thomas algorithm • Logarithmic speedup

• Propagate belief among vertices within and cross time stamps

Speedup wrt Gaussian Elimination: T3 / logT

© 2014 International Business Machines Corporation

IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences

13

Comments and Questions?