when graph meets big data: opportunities and...

30
When Graph Meets Big Data: Opportunities and Challenges Yinglong Xia Huawei Research America 11/13/2016 The International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16) High Performance Graph Data Management and Processing (HPGDM 2016)

Upload: others

Post on 04-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

When Graph Meets Big Data: Opportunities and Challenges

Yinglong XiaHuawei Research America11/13/2016

The International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16)

High Performance Graph Data Management and Processing (HPGDM 2016)

Page 2: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

2

Introduction

Page 3: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

3

Page 4: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

4

Recent Growth

Revenue

Net ProfitsCash flow http://www.huawei.com/en/about-huawei

Huawei has business in over 170 countries, with 150,000 employees, approximately 70,000 of which are engaged in Research & development. Huawei operates a global network of 14 regional headquarters, 16 R&D Centers, 28 Innovation Centers jointly operated with customers, and 45 Training Centers.

Page 5: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

5

Graph Analytics Basics

Page 6: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

6

Brief History

N.T. Bliss, Confronting the Challenges of Graphs and Networks, Lincoln Laboratory Journal, 2013

2016

Neuronal network @ Human Brain Project 89 billion V & 100 trillion E

61.6 million V1.47 billion E

40 million V300 million E

Page 7: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

Import properties/metrics:- Small-world effect- Betweenness- Eccentricity/Centrality- Transitivity- Resilience- Community structure- Clustering coefficient- Matching index

7

Complex Network AnalysisReal world complex networks include WWW, Social Network, Biological network, Citation Network, Power Grid, Food Web, Metabolic network, etc.

Complex network models:- Poisson random graph

- degree~Poisson- Small world effect

- Watts and Strogatz graph- Transitivity- Small world effect

- Barabasi and Albert graph- Small world- Power law

Page 8: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

8

Diversity in Graph TechnologyDynamic graph helps analyze thespatial and temporal influence overthe entities in the network

RDF graph enables knowledge inference over linked data

Streaming graph monitorssentiment propagation overtime and how the graph structure can impact

Property graph is widely used as a data storage model to manage the properties of entities as well as the interconnections

Vertex ID

Edge label

Edge property

Graph technology leads to rich analytic abilities

Graphical models leverages statistics to inference latentfactors in a complex system

Page 9: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

9

Industrial Use Cases

Page 10: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

10

Financial Risk Management• 56 million small business owners in China• 1/3 want to finance their ventures• Weight to 24.3% of the GDP• 16.3% received loads from banks• 3/4 citizens in China has no credit history

Identifying Fraud by Graph

Analyze one’s social connections

Motivation

Credit defaultswap record

Visit gamble forum

Called lier in someone’s comment

• Strong variables• Weak variables• Individual regression• collective regression

Traditional credit scoring organizations

Emerging credit scoring organizations

• Customer profiling• Precise engagement• Anti-fraud control• Credit scoring in realtime• Post-load management• Lost-customer engage

Credit Scoring

Samsung Group cross-ownership structure in Gephi network visualization

Auditing such labyrinthian beasts is enough to bring a grown auditor to tears. This is where network ‘graph analytics’ can be invaluable

Auditing

Emerging approach

Traditional approach

Page 11: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

11

Public Security

• Knowledge graph• Heterogeneous data• Information integration• Complex search• Security analysis

Insider Thread Solution

• Graph is natural to link data from different sources for exploration

• Graph can combine with probabilistic models for inference

• Graph helps find anomalous behaviors

Advantage of Graph for Security

Use Case -1 Use Case -2

Page 12: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

12

Telecom FraudIn May 2016, Communications Fraud Control Association (CFCA) and the Forum for International Irregular Network Access (FIINA), operators ranging from AT&T, Vodafone, Korea Telecom to Orange and Deutsche Telekom shed light on how old and newer forms of fraud are detected and combatted

Anti-Fraud platform proposed by TMForum

Page 13: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

13

Existing Graph Systems

Page 14: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

14

Some Existing Products

Visualization

Analytics

Frameworks

Storage

ScaleGraph

Flink/Gelly

Page 15: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

15

Neo4j System Architecture and Storage FormatTraversals Core API Cypher

Vertex/Edge Cache Thread local diffs

FS Cache HA

Record filesTransaction log

Disk

Graph structure and data buffers

i.e. mmap

LFU-protocol

Link edges inclined to a vertex using the relationship data structure, imposing some performance issue for handling celebrates in power-law graphs e.g. social network

Neo’s declarative query language

for TX roll-back

changes in a TX

High Availability based on TX

Easy to implement horizontal partitioning in FS

Page 16: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

16

Titan System Architecture and Storage Format

Store Manager

Transaction store

Relations

Index Store

Page 17: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

17

OrientDB System ArchitectureGraph JDBC

DocDB based storage

Support distributed platforms, offering key-value store, docDB, and graphDB in one system

Page 18: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

18

Glance at Graph Computing EnginesSpark/GraphX

GraphChi

Page 19: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

19

Graph in ONOS

HotSDN’2014

Page 20: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

20

Challenges

Page 21: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

21

Challenges - Performance● Understand performance bottleneck by

breaking down the execution time● Bottleneck comes from the memory sub-system

● DTLB is inefficient● Cache performs well● Cache MPKI rate is high

Core graph algorithms from 21 real-world use cases

3 different types of graph computing, with focus on structural traversal, property processing, and graph editing, respectively

Page 22: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

22

Challenges — Input Sensitivity● Impact from graph topology

● Power-law graph results in imbalanced workload due to dense vertices

● Dense subgraph, sparse backbone

● Dense subgraph can be converted into matrices

● Iterative update in a subgraph● Road net is easy to decompose

● Property type matters● More time spent on property

management● Computing performance can be

negatively impacted

Performance is inconsistent across different graph types

Page 23: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

23

Challenges - Impact of H/W Accelerator● GPU can be helpful

● Sufficient acceleration by GPU● Requires re-design of the

algorithms

● Challenges● Data must be transferred to GPU● Cost of Host to Device data transfer● Difficulty in putting large graph into

GPU (Double buffering)● Sensitive to input graph data

Speedup of NVIDIA Tesla K40 over 16-core Intel Xeon E5-2670

Memory divergency shows higher sensitivity for graph computing on GPU

Page 24: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

24

Challenges - Scale-out Issue● Poor data locality and difficult partitioning result in

challenges in scaling out the computing ● Scale-out challenges can be

seen in Graph500 analysis● Single machine with big

memory can help● Must be cautious to use

many computing nodes

degraded performance when #core is 100~1000

Analysis of data from Graph500

*from Peter Kogge

Page 25: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

25

Breakthrough

Page 26: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

26

Graph Platform for Smart Big Data

Infrastructure

Data Management

Graph engines

Visualization

Analytics

Single Machine Cluster GPU Server Cloud

Structure Management

PropertyManagement

Metadata Management

Permission Control

Basic Engine

Streaming Graph Graphical Model Hyper Graph

Bayes NetCommunity

Label propagationCentrality

Anomaly detection

Matching

Ego Feature

Max Flow

Dynamic Graph Vis Property Vis Large Graph Vis

Incremental Update

Page 27: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

27

Unified Graph Data Access Patterns1

2

3

4

5

6

1 2 3 4 5 60.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.20.3

1.1equivalent

src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8

src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9

src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1

shard 1 (1, 2) shard 2 (3,4) shard 3 (5,6)

src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8

src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9

src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1

src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8

src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9

src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1

1

2

3

4

5

6

0.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.2

0.3

1.1

1

2

3

4

5

6

0.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.2

0.3

1.1

step

1st

ep 2

step

3

obse

rvat

ion

on P

SW d

ata

acce

ss

patte

rns

insp

ires

high

ly e

ffici

ent

shar

ding

repr

esen

tatio

n

Itera

tion i

Page 28: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

28

Experiments

Performance improvement of SSSP against GraphChi including data ingestion

Execution time breakdown on Pagerank running Twitter-2010

Execution time breakdown on Pagerank running Twitter-2010

Disk read bandwidth over time by GraphChi and EdgesSet. Edge-Set showed up to 2x aggregate bandwidth and more constant IO usage.

Page 29: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13  · When Graph Meets Big Data: Opportunities and Challenges Yinglong

29

Opportunities in Graph Technology for Big Data● Develop high performance graph computing kernels and primitives

• Graph500 technique based architecture-awareness for graph computing• Heterogeneous computing and computing near-data technology

● Reinvent graph technology for supporting cognitive computing• One open platform with multiple graph and graph-related technologies• Integral consideration on graphical model, streaming graphs, etc. for AI/IoT

● Offer vertical solutions to break through separation among technique stacks• Holistic solution for rapidly building industry-level graph analytics solutions• Incorporating with market segmentation, such as security, finance, etc.

● Collaborations and Standardization• Foster collaboration with relevant professional communities to educate the market• Developing domain or cross-domain standardizations