ling liu part 02:big graph processing

35
Ling Liu School of Computer Science College of Compu2ng Part II: Distributed Graph Processing

Upload: jins0618

Post on 15-Apr-2017

112 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Ling liu part 02:big graph processing

Ling  Liu    

School  of  Computer  Science  College  of  Compu2ng  

 Part  II:    

Distributed  Graph  Processing

Page 2: Ling liu part 02:big graph processing

2 2

Big  Data  Trends

Big Data

Volume

Velocity Variety

1 zettabyte = a trillion gigabytes (1021 bytes) CISCO, 2012

500 million Tweets per day

100 hours of video are uploaded every minute

Page 3: Ling liu part 02:big graph processing

3 3

Why  Graphs?

Graphs  are  everywhere

!  

Social Network Graphs

Road Networks

National Security

Business Analytics Biological

Graphs

Friendship Graph Facebook Engineering, 2010

Brain Network The Journal of Neuroscience 2011

US Road Network www.pavementinteractive.org

Web Security Graph

McAfee, 2013

Intelligence Data Model NSA, 2013

Page 4: Ling liu part 02:big graph processing

4 4

How  Big?

Social  Scale  !   1  billion  ver2ces,  100  billion  edges    

!   111  PB  adjacency  matrix  !   2.92  TB  adjacency  list  

NSA-RD-2013-056001v1

Social scale. . .

1 billion vertices, 100 billion edges

111 PB adjacency matrix

2.92 TB adjacency list

2.92 TB edge list

Twitter graph from Gephi dataset(http://www.gephi.org)

Paul Burkhardt, Chris Waring An NSA Big Graph experimentWeb  Scale  !   50  billion  ver2ces,  1  trillion  edges  

!   271  EB  adjacency  matrix  !   29.5  TB  adjacency  list  

Brain  Scale  !   100  billion  ver2ces,  100  trillion  edges  

!   1.1  ZB  adjacency  matrix  !   2.83  PB  adjacency  list  

NSA-RD-2013-056001v1

Web scale. . .

50 billion vertices, 1 trillion edges

271 EB adjacency matrix

29.5 TB adjacency list

29.1 TB edge list

Internet graph from the Opte Project(http://www.opte.org/maps)

Web graph from the SNAP database(http://snap.stanford.edu/data)

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

NSA-RD-2013-056001v1

Brain scale. . .

100 billion vertices, 100 trillion edges

2.08 mNA · bytes2 (molar bytes) adjacency matrix

2.84 PB adjacency list

2.84 PB edge list

Human connectome.Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011

2NA = 6.022⇥ 1023mol�1

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

Page 5: Ling liu part 02:big graph processing

5 5

Big  Graph  Data  Technical  Challenges

Huge  and  growing  size  -­‐ Requires  massive  storage  capaci2es  -­‐ Graph  analy2cs  usually  requires  much  bigger  compu2ng  and  storage  resources  

Complicated  correlaEons  among  data  enEEes  (verEces)  -­‐ Make  it  hard  to  parallelize  graph  processing  (hard  to  par22on)  -­‐ Most  exis2ng  big  data  systems  are  not  designed  to  handle  such  complexity

Skewed  distribuEon  (i.e.,  high-­‐degree  verEces)  -­‐ Makes  it  hard  to  ensure  load  balancing  

Page 6: Ling liu part 02:big graph processing

6 6

Parallel  Graph  Processing:  Challenges  

•  Structure  driven  computa2on  –  Storage  and  Data  Transfer  Issues  

•  Irregular  Graph  Structure  and  Computa2on  Model  –  Storage  and  Data/Computa2on  Par22oning  Issues  –  Par22oning  v.s.  Load/Resource  Balancing    

6

Page 7: Ling liu part 02:big graph processing

7 7

Parallel  Graph  Processing:  OpportuniEes  

•  Extend  Exis2ng  Paradigms  –  Vertex  centric  –  Edge  centric  

•  BUILD  NEW  FRAMEWORKS  for  Parallel  Graph  Processing  –  Single  Machine  Solu2ons    

•  GraphLego  [ACM  HPDC  2015]  /  GraphTwist  [VLDB2015]  

–  Distributed  Approaches  •  GraphMap  [IEEE  SC  2015],  PathGraph  [IEEE  SC  2014]  

7

Page 8: Ling liu part 02:big graph processing

8 8

Build  New  Graph  Frameworks:    Key  Requirements/Challenges  

•  Less  pre-­‐processing  •  Low  and  load-­‐balanced  computaEon  •  Low  and  load-­‐balanced  communicaEon  •  Low  memory  footprint  •  Scalable  wrt  cluster  size  and  graph  size  

•  General  graph  processing  framework  for  large  collecEons  of  graph  computaEon  algorithms  and  applicaEons    

Page 9: Ling liu part 02:big graph processing

9 9

Graph  OperaEons:  Two  DisEnct  Classes

IteraEve  Graph  Algorithms  !   Each  execu2on  consists  of  a  set  of  itera2ons  

!   In  each  itera2on,  vertex  (or  edge)  values  are  updated  !   All  (or  most)  ver2ces  par2cipate  in  the  execu2on  

!   Examples:  PageRank,  shortest  paths  (SSSP),  connected  components  !   Systems:  Pregel,  GraphLab,  GraphChi,  X-­‐Stream,  GraphX,  Pregelix

Graph  PaXern  Queries  ! Subgraph  matching  problem  

!   Requires  fast  query  response  2me  !   Explores  a  small  frac2on  of  the  en2re  graph  

!   Examples:  friends-­‐of-­‐friends,  triangle  paberns          !   Systems:  RDF-­‐3X,  TripleBit,  SHAPE  

VLDB 2014

IEEE SC 2015

Page 10: Ling liu part 02:big graph processing

10 10

Distributed  Approaches  to    Parallel  Graph  Processing

SHAPE:  Distributed  RDF  System  with  Seman2c  Hash  Par22oning  !   Graph  Pabern  Queries  !   Seman2c  Hash  Par22oning  !   Distributed  RDF  Query  Processing  !   Experiments  

GraphMap:  Scalable  Itera2ve  Graph  Computa2ons  !   Itera2ve  Graph  Computa2ons  ! GraphMap  Approaches  !   Experiments  

Page 11: Ling liu part 02:big graph processing

11 11

What  Are  IteraEve  Graph  Algorithms?

IteraEve  Graph  Algorithms  !   Each  execu2on  consists  of  a  set  of  itera2ons  

!   In  each  itera2on,  vertex  (or  edge)  values  are  updated  !   All  (or  most)  ver2ces  par2cipate  in  the  opera2ons  

!   Examples:  PageRank,  shortest  paths  (SSSP),  connected  components  !   Systems:  Google’s  Pregel,  GraphLab,  GraphChi,  X-­‐Stream,  GraphX,  Pregelix

SSSP

Connected Components Source: amplab

Page 12: Ling liu part 02:big graph processing

12 12

Why  Is  IteraEve  Graph  Processing  So  Difficult?

Huge  and  growing  size  of  graph  data  -­‐ Makes  it  hard  to  store  and  handle  the  data  on  a  single  machine  

Poor  locality  (many  random  accesses)  -­‐ Each  vertex  depends  on  its  neighboring  ver2ces,  recursively    

Huge  size  of  intermediate  data  for  each  iteraEon  -­‐ Requires  addi2onal  compu2ng  and  storage  resources    

Heterogeneous  graph  algorithms  -­‐ Different  algorithms  have  different  computa2on  and  access  paberns    

High-­‐degree  verEces  -­‐ Make  it  hard  to  ensure  load  balancing    

Page 13: Ling liu part 02:big graph processing

13

The  problems  of  current  computaEon  models  

13

Page 14: Ling liu part 02:big graph processing

14

The  problems  of  current  computaEon  models  

•  Ghost  ver2ces  maintain  adjacency  structure  and  replicate  remote  data.  

•  Too  much  interac2ons  among  par22ons  

14

“ghost” vertices

Page 15: Ling liu part 02:big graph processing

15 15

IteraEve  Graph  Algorithm  Example

Figure source: Apache Flink

Connected  Components    

Page 16: Ling liu part 02:big graph processing

16 16

Why  Don’t  We  Use  MapReduce?

Of  course,  we  can  use  MapReduce!  

The  first  iteraEon  of  Connected  Components  for  this  graph  would  be  …    

Map  

K V 2 1

K V 1 2 3 2 4 2

K V 2 4 3 4 K V

2 3 4 3

Reduce  

K Values 1 2

Min 1

2 1,3,4 3 2,4 4 2,3

1 2 2

Page 17: Ling liu part 02:big graph processing

17 17

Why  We  Shouldn’t  Use  MapReduce

But  …  

In  a  typical  MapReduce  job,  disk  IOs  are  performed  in  four  places  

So…  10  iteraEons  mean…  

Figure source: http://arasan-blog.blogspot.com/

Disk  IOs  in  40  places  

Page 18: Ling liu part 02:big graph processing

18 18

Related  Work

Distributed  Memory-­‐Based  Systems  !   Messaging-­‐based:  Google  Pregel,  Apache  Giraph,  Apache  Hama  !   Vertex  mirroring:  GraphLab,  PowerGraph,  GraphX  !   Dynamic  load  balancing:  Mizan,  GPS  !   Graph-­‐centric  view:  Giraph++  

Disk-­‐Based  Systems  using  single  machine  !   Vertex-­‐centric  model:  GraphChi  !   Edge-­‐centric  model:  X-­‐Stream  !   Vertex-­‐Edge  Centric:  GraphLego  

With  External  Memory  !   Out-­‐of-­‐core  capabili2es  (Apache  Giraph,  Apache  Hama,  GraphX)  

!   Not  op2mized  for  graph  computa2ons  !   Users  need  to  configure  several  parameters  

Page 19: Ling liu part 02:big graph processing

19 19

Two  Research  DirecEons

IteraEve  Graph  Processing  Systems  

Disk-based systems on a single machine

 !   Load  a  part  of  the  input  graph  in  memory    

!   Include  a  set  of  data  structures  and  techniques  to  efficiently  load  graph  data  from  disk    

! GraphChi,  X-­‐Stream,  …  ! Disadv.:  1)  relaEvely  slow,  2)  

resource  limitaEons  of  a  single  machine  

Distributed memory-based systems on a cluster

 !   Load  the  whole  input  graph  in  memory  

!   Load  all  intermediate  results  and  messages  in  memory  

! Pregel,  Giraph,  Hama,  GraphLab,  GraphX,  …  

! Disadv.:  1)  very  high  memory  requirement,  2)  coordinaEon  of  distributed  machines  

Page 20: Ling liu part 02:big graph processing

20 20

Main  Features

Develop  GraphMap  !   Distributed  itera2ve  graph  computa2on  framework  that  effec2vely  

u2lizes  secondary  storage  !   To  reduce  the  memory  requirement  of  itera2ve  graph  computa2ons  

while  ensuring  compe22ve  (or  beber)  performance  

Main  ContribuEons  !   Clear  separaEon  between  mutable  and  read-­‐only  data  !   Two-­‐level  parEEoning  technique  for  locality-­‐op2mized  data  

placement  !   Dynamic  access  methods  based  on  the  workloads  of  the  current  

itera2on    

Page 21: Ling liu part 02:big graph processing

21 21

Clear  Data  SeparaEon

Graph  Data  

Vertices and their data (mutable)

Edges and their data (read-only)

Read edge data for each iteration!

Page 22: Ling liu part 02:big graph processing

22 22

Locality-­‐Based  Data  Placement  on  Disk

Edge  Access  Locality  !   All  edges  (out-­‐edges,  in-­‐edges  or  bi-­‐edges)  of  a  vertex  are  accessed  

together  to  update  its  vertex  value  è  We  place  all  connected  edges  of  a  vertex  together  on  disk    

Vertex  Access  Locality  !   All  ver2ces  in  a  par22on  are  accessed  by  the  same  worker  

(processor)  in  every  itera2on  è  We  store  all  ver2ces,  in  a  par22on,  and  their  edges  into  con2guous  disk  blocks  to  u2lize  sequenEal  disk  accesses    

How  can  you  access  disk  efficiently  for  each  iteraEon?  

Page 23: Ling liu part 02:big graph processing

23 23

Dynamic  Access  Methods

Various  Workloads   If  the  current  workload  is    larger  than  the  threshold?  

YES NO

Sequential disk accesses

Random disk accesses

The threshold is dynamically configured based on actual access

times for each iteration and for each worker

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6 7 8 9

# ac

tive

vert

ices

(x10

00)

Iteration

PageRank CC SSSP

Page 24: Ling liu part 02:big graph processing

24 24

Experiments

First  Prototype  of  GraphMap  !   BSP  engine  &  messaging  engine:  U2lize  Apache  Hama  !   Disk  storage:  U2lize  Apache  HBase  

!   Two-­‐dimensional  key-­‐value  store  

Sefngs  !   Cluster  of  21  machines  on  Emulab  

!   12GB  RAM,  Xeon  E5530,  500GB  and  250GB  SATA  disks  !   Connected  via  a  1  GigE  network  

! HBase  (ver.  0.96)  on  HDFS  of  Hadoop  (ver.  1.0.4)  !   Hama  (ver.  0.6.3)  

IteraEve  Graph  Algorithms  !   1)  PageRank  (10  iter.),  2)  SSSP,  3)  CC  

Page 25: Ling liu part 02:big graph processing

25 25

ExecuEon  Time

Analysis  !   Hama  fails  for  large  graphs  with  more  than  900M  edges  

while  GraphMap  s2ll  works  !   Note  that,  in  all  the  cases,  GraphMap  is  faster  (up  to  6  2mes)  

than  Hama,  which  is  the  in-­‐memory  system  

Page 26: Ling liu part 02:big graph processing

26 26

Breakdown  of  GraphMap  ExecuEon  Time

PageRank on uk-2005 SSSP on uk-2005

CC on uk-2005

Analysis  !   For  PageRank,  all  itera2ons  have  similar  

results  except  the  first  and  last  !   For  SSSP,  itera2on  5  –  15  u2lize  

sequen2al  disk  accesses  based  on  our  dynamic  selec2on  

!   For  CC,  random  disk  accesses  are  selected  from  itera2on  24  

Page 27: Ling liu part 02:big graph processing

27 27

Effects  of  Dynamic  Access  Methods

Analysis  ! GraphMap  chooses  the  

op2mal  access  method  in  most  of  the  itera2ons  

!   Possible  further  improvement  through  fine-­‐tuning  in  itera2ons  5  and  15    

!   For  cit-­‐Patents,  GraphMap  always  chooses  random  accesses  because  only  3.3%  ver2ces  are  reachable  from  the  start  vertex  and  thus  the  number  of  ac2ve  ver2ces  is  always  small  

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35 40

Com

puta

tion

Tim

e (s

ec)

Iteration

Sequential Random Dynamic

Page 28: Ling liu part 02:big graph processing

28 28

•  Exis2ng  distributed  graph  systems  are  all  in-­‐memory  systems.  In  addi2on  to  Hama,  we  give  a  rela2ve  comparison  with  a  few  other  representa2ve  systems:  

Comparing  GraphMap  with  other  systems  

GraphMap: 12GB DRAM per node of a cluster of size 21 nodes and 252GB distributed shared memory

5x DRAM per node

Page 29: Ling liu part 02:big graph processing

29 29

Social  Life  Journal  (LJ)  Graph  Dataset  

Vertices: Members Edges: Friendship

Graph dataset (stored in HDFS) cit-Patents (raw size: 268MB): 3.8M vertices, 16.5M edges soc-LiveJournal1 (raw size: 1.1GB): 4.8M vertices, 69M edges

Page 30: Ling liu part 02:big graph processing

30 30

•  Cluster  sepng  –  6  machines  (1  master  &  5  slaves)  

•  Spark  sepng  –  Spark  shell  (i.e.,  did  NOT  implement  any  Spark  applica2on  yet)  • Built-­‐in  PageRank  func2on  of  GraphX  

–  All  40  cores  (=  8  cores  x  5  slaves)  –  Por2on  of  memory  for  RDD  storage:  0.52  (by  default)  

•  If  we  assign  512MB  for  each  executor,  about  265MB  is  dedicated  for  RDD  storage  

Our  iniEal  experience  with  SPARK  /  GraphX

30

Page 31: Ling liu part 02:big graph processing

31 31

•  Implemented  two  hash  par22oning  techniques  using  GraphX  API  –  1)  by  source  vertex  IDs,  2)  by  des2na2on  vertex  IDs  

Use-­‐defined  ParEEoning  (soc-­‐LiveJournal1)

31

5GB slave RAM & 40 partitions

loading 1st PageRank 10 iterations

2nd PageRank 10 iterations

3rd PageRank 10 iterations

Hashing by src. vertices

10s (2.1GB) 196s (7.2GB) (12.3 / 13.2GB)

156s (7.2GB) (12.1 / 13.2GB)

OOM

Hashing by dst. vertices

10s (2.1GB)

168s (5.8GB) (12.5 / 13.2GB)

OOM

EdgePartition2D 10s (2.1GB) 199s (8GB) (12.1 / 13.2GB)

OOM

EdgePartition1D 13s (2.1GB, outlier?)

188s (6.9GB) (12.2 / 13.2GB)

173s (7.2GB) (12.3 / 13.2GB)

OOM

10GB slave RAM & 40 partitions

loading 1st PageRank 10 iterations

2nd PageRank 10 iterations

3rd PageRank 10 iterations

Hashing by src. vertices

10s (2.1GB) 189s (13.2GB) (17.5 / 26.1 GB)

186s (18.2GB) (23 / 26.1GB)

OOM

Hashing by dst. vertices

11s (2.1GB)

223s (15.9GB) (21.1 / 26.1GB)

OOM

Page 32: Ling liu part 02:big graph processing

32 32

•  Our  ini2al  experience  with  SPARK  –  Spark  performs  well  with  large  per-­‐node  with  >=  68GB  DRAM,  as  reported  in  the  SPARK/GraphX  paper.    

–  Do  not  perform  well  for  cluster  with  nodes  of  smaller  DRAM  

•  Messaging  Overheads  –  Distributed  graph  processing  systems  do  not  scale  as  the  #  nodes  increases  due  to  the  amount  of  messaging  cost  among  compute  nodes  in  the  cluster  to  synchronize  the  computa2on  in  each  itera2on  round  

Spark/GraphX  experience  and  Messaging  Cost  

Page 33: Ling liu part 02:big graph processing

33 33

Summary

GraphMap  !   Distributed  itera2ve  graph  computa2on  framework  that  effec2vely  

u2lizes  secondary  storage  !   Clear  separa2on  between  mutable  and  read-­‐only  data  !   Locality-­‐based  data  placement  on  disk  !   Dynamic  access  methods  based  on  the  workloads  of  the  current  

itera2on  

Ongoing  Research  !   Disk  and  worker  coloca2on  to  improve  the  disk  access  performance  !   Efficient  and  lightweight  par22oning  techniques,  incorpora2ng  our  

work  on  GraphLego  for  single  PC  graph  processing  [ACM  HPDC  2015]  !   Comparing  with  SPARK/GraphX  on  larger  DRAM  cluster  

Page 34: Ling liu part 02:big graph processing

34 34

General  Purpose  Distributed  Graph  System

ExisEng  State  of  Art  !   Separate  efforts  for  the  two  representa2ve  graph  opera2ons      !   Separate  efforts  for  the  scale-­‐up  and  scale-­‐out  systems  

Challenges  for  Developing  a  General  Purpose  Graph  Processing  System  !   Different  data  access  paberns  /  graph  computa2on  models  !   Different  inter-­‐node  communica2on  effects  

Possible  DirecEons  !   Graph  summariza2on  techniques    !   Lightweight  graph  par22oning  techniques    !   Op2mized  data  storage  systems  and  access  methods  

Page 35: Ling liu part 02:big graph processing

35

VISIT:  

35

https://sites.google.com/site/gtshape/ https://sites.google.com/site/git_GraphLego/ https://sites.google.com/site/git_GraphTwist/