towards temporal graph management and analytics
TRANSCRIPT
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
Recent Updates on IBM System G — GraphBIG and Temporal Data
Yinglong Xia IBM T.J. Watson Research Center Yorktown Heights, NY 10598
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
2
IBM T.J. Watson Research Center
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
3
Using LDBC-SNB for GraphBIG
• GraphBIG = Graph Benchmark Suite from IBM System G and GaTech HPArch • A wide selection of workloads from both CPU and GPU • Workload ranging from graph traversal to Gibbs Sampling on Bayesian Network • Illustrating processor architecture impact using h/w performance counter
• Fix input data and implementation • Show performance profiling at processor architecture level
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
4
Beyond the Benchmarking for Graph DBs• Graph computing was barely considered in architecture design • Increasing motivation due to popularity of graph analytics • Impact of architecture requires fixed input data and analytic implementation
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
5
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
6
Demanding Graph • Interactions of entities in many big data applications are naturally modeled by property graphs • Evolution of graph structure and properties over time usually provides useful information, which needs
to be maintained for query or analytics • Graph analytics market grows increasingly fast as well as the graph data size and complexity, but
near real time response is typically required
Xiaoyan Fu, Seok-Hee Hong, Nikola S. Nikolov, Xiaobin Shen, Ying Xin Wu and Kai Xu, Visualization and Analysis of Email Networks, Proceedings of APVIS 2007, IEEE, pp.1-8, 2007.
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
7
Use Case: Forensic Analysis on Individual Status
• Recover the dynamics of individual status • Evaluate status measures, anomalies, etc. • Propagate known status measures • Estimate labels for each person at each time stamp • Aggregate the received measures
Chain Graph: A collection of graphs on contiguous time steps
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
8
Use Case: Bitemporal Data Exploration
• Support the valid dimension and the transaction dimension • Audit trail of what you know and when did you know • History of how history from a business perspective was stored in the
database
http://bitemporalmodeling.com/temporal-data-blog/
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
9
Graph Data ManagementSparkseeNeo4j
Titan
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
10
Organization of Graph Store
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
11
Organize Temporal Graph Data Name Default Value
vertex_history Disabled
num_vertex_property_bundles 0
edge_history Disabled
num_edge_property_bundles 0
… …
Flag (uint8)
inEdge(uint64)
inEdge Count (uint16)
outEdge(uint64)
outEdge Count (uint16)
Property(uint64)
Property Count (unit64)
History(unit64)
…
…
…
Vertex Record Table
inEdge List
Flag Property PropertyCount
History …
……
Prev Edge_list_buffer<EID,VID,LID>
…
…Edge Record Table
Accessed Vertex Record by VID * sizeof (VtxRec)
inEdgeCount * sizeof (<EID,VID,LID>) point to the buffer end
Accessed Edge Record by EID * sizeof (EdgeRec)
Vertex Property Table
Prev property_buffer
…
…
PropertyCountpoint to the buffer end
Edge Property Table
Prev property_buffer
…
…
Name Default Value
min_VID 0
max_VID
min_EID 0
Max_EID
… …
Local Configuration
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
12
Pointer Jumping in Temporal Graph Inference
• Converting a temporal graph into tridiagonal system • Forward Gaussian elimination by propagation • Backward substitution to produce solutions
• A Parallel Solution to Thomas Algorithm • Apply pointer jumping to Thomas algorithm • Logarithmic speedup
parallel solution to solve a tridiagonal linear system
• Converting a chain graph into tridiagonal system • Forward Gaussian elimination by propagation • Backward substitution to produce solutions
• A Parallel Solution to Thomas Algorithm • Apply pointer jumping to Thomas algorithm • Logarithmic speedup
• Propagate belief among vertices within and cross time stamps
Speedup wrt Gaussian Elimination: T3 / logT