graphr: accelerating graph processing using reram - sites@duke · cei.pratt.duke.edu graph...
TRANSCRIPT
![Page 1: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/1.jpg)
GraphR: Accelerating Graph Processing Using ReRAM
Linghao Song*, Youwei Zhuo#, Xuehai Qian#, Hai Li*, Yiran Chen*
*Duke University#University of Southern California
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 2: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/2.jpg)
Graph Processing• To understand relationships in a group of nodes
• A wide range of application domains—Bioinformatics, Social Networks, Cyber Security, Data Mining…
• Classic algorithms:—Sparse Matrix Vector Multiplication (SpMV)
—Single Source Shortest Path (SSSP)
—Page Rank
(en.wikiquote.org)
CEI cei.pratt.duke.edu
ALCHEM alchem.usc.edu
![Page 3: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/3.jpg)
The Need for Graph Processing Accelerators • Graph processing algorithms:
• Generate random access• Require high memory bandwidth
• Good target for hardware acceleration• Tesseract (ISCA’15): HMC+Inorder-Cores• Graphicionado (MICRO’16): dedicated memory accessing module• Energy Efficient Architecture for Graph (ISCA’16): asynchronous
execution• These accelerators are based on:
• Vertex-centric processing model• Conventional CMOS technology
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 4: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/4.jpg)
ReRAM Based Acceleration• ReRAM Xbar for Matrix-Vector Multiplication
……
Bitline : Y
Wor
dlin
e : X
Xbar : W
Y=W*X
114
114
128
5123
3
112
112
512
=
…
WL_0
WL_1…BL_0BL_1BL_2
WL_2303
… ⇒
01212543
2304
…
BL_511
(2304=3*3*128*2)
12544(12544=112*112)
layer l+1layer l
(PipeLayer HPCA’17)
(PRIME ISCA’16,ISAAC ISCA’16,
Dot-Product Engine DAC’16,RENO DAC’15)
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 5: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/5.jpg)
Graph Processing in Action• Vertex-centric processing model
…
…
…………
random access
global random accessHigh memory bandwidth— little computation on the randomly fetched data
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Edges
Vertices
![Page 6: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/6.jpg)
Graph Processing in Action• Edge-centric Processing Model…
…
Renew
Read & Process
…
sequential write
…Generate Updates
sequential read
Sequential edge access.Random vertex access.
(X-Stream SOSP’13)
ALCHEM alchem.usc.edu
Edges
VerticesCEI cei.pratt.duke.edu
![Page 7: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/7.jpg)
GraphR: Graph Processing with ReRAM Xbar
V4
V7
V1
V2
V15
Process Edge
Reduce/Apply
V4
V1 V2 V7
edges to V4value of all vertices
=
new value of V4
ReRAMCrossbar (CB)
perform SpMV in analog manner
But, WAIT!A Xbar with a size of V-by-V?The matrix is sparse.
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 8: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/8.jpg)
Storage and Computation EfficiencyStore:
Compressed FormatComputation:
ReRAM Xbar SpMV(row,col,val)(0,2,3)(0,3,8)(1,2,7)(2,0,1)(3,1,4)(3,3,2)
X
Y=W*X
StorageEfficiency
ComputationEfficiency
GraphR
ALCHEM alchem.usc.edu
Coordinate ListCOO
CEI cei.pratt.duke.edu
![Page 9: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/9.jpg)
GraphR Overview
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Graph Preprocessing Ordered edge list (COO) on
disk
GraphR
Memory ReRAM
Graph Engines (GEs)
Load Block(i)(sequential disk I/O)
Steaming-apply subgraphs (sequential access)
Controllerin GraphR
Software
![Page 10: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/10.jpg)
Stream-Apply Execution
Block in Storage(COO)
Destination Vertices
Sour
ce V
ertic
es Block in Storage(COO)
Block in Storage(COO)
Block in Storage(COO)
Block in GraphR
RegI
-1
RegO
GE GE GE GE
RegI
-2Re
gI-3
RegI
-4
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 11: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/11.jpg)
Graph Engine (GE) Processing Patterns• Different algorithms achieve different parallelism when
mapped to Xbars
• Assuming an N×N Xbar
• Parallel Multiply-Accumulate (MAC)
• Performing N2 multiplications and N2 additions in parallel
• Parallel Add-op
• Performing N additions and N ops (can be defined) in parallel
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 12: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/12.jpg)
Parallel MAC• Performing N2 multiplications and N2 additions
in parallel
SrcDst j3j2j1j0
i0i1i2i3
i0:1/4
i1:1/4
i3:1/4
i2:1/4
1/3
1/3
1/3
j0 j1 j3j2
Src
Dst1/2 1/2
1
1/2
1/2
� ���� ���� ����
��� � � ���
� � ��� �
� ��� ��� �
���� ���� ��������
��
��
��
��
�
� � � �
Src
Dst
� ���� ���� ����
��� � � ���
� � ��� �
� ��� ��� �
���� ���� ����
1/4
9/60 13/60 25/60
1/41/4
1/41 ����
13/60
������
ALCHEM alchem.usc.edu
16 MULT , 16 ADD
CEI cei.pratt.duke.edu
![Page 13: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/13.jpg)
Parallel Add-op• Performing N additions and N ops (can be defined) in parallel
SrcDst
i0i1i2i3
i0 4 i1:3 i3:2i2:11
5 31 1
j0:7 j1:6 j3:Mj2:M
Src
Dst (
j0:7 j1:5 j3:4j2:3Dst )
� � � �
� � � �
� � � �
� � � �
� � � �
min
7 6 M M
min min min
41
7 5 9 M
M 5 9 M3
12
Dst����
i0
i1
i2
i3Src(RegI)
� � � �
� � � �
� � � �
� � � �
� � � �
min
7 5 9 M
min min min
31
7 5 6 4
M M 6 41
2
� � � �
� � � �
� � � �
� � � �
� � � �
min
7 5 6 4
min min min
11
7 5 6 4
M M M M2
Dst����
� � � �
� � � �
� � � �
� � � �
� � � �
min
7 5 6 4
min min min
21
7 5 3 4
M M 3 M
(RegO)
ALCHEM alchem.usc.edu
4 ops
4 adds
CEI cei.pratt.duke.edu
![Page 14: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/14.jpg)
Also in the paper …• Graph dataset preprocessing method
• Hardware components in GraphR
• Detailed comparison to other accelerators (Table 1)
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 15: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/15.jpg)
Evaluation• Evaluation Setup
- Data Sets
- Applications: PageRank, BFS, SSSP, SpMV- CPU: Intel Xeon E5-2630 V3- GPU: NVIDIA Tesla K40c
- GraphR: 8-8 Xbar, 32 Xbars/GE, 64 GEs ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 16: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/16.jpg)
CPU Comparison: Performance
- Gmean: Performance 16.01x- SpMV, PageRank > BFS, SSSP
- Parallel MAC leads to higher speedup
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 17: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/17.jpg)
CPU Comparison: Energy Efficiency
- Energy Efficiency 33.82x
ALCHEM alchem.usc.edu
- SpMV, PageRank > BFS, SSSP- Parallel MAC leads to higher energy efficiency
CEI cei.pratt.duke.edu
![Page 18: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/18.jpg)
GPU Comparison
• Speedup: 1.69× to 2.19× • Energy Efficiency: 4.77× to 8.91×
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 19: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/19.jpg)
Accelerator Comparison
ALCHEM alchem.usc.edu
• Speedup: 1.16× to 4.12× • Energy Efficiency: 3.67× to 10.96× CEI
cei.pratt.duke.edu
![Page 20: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/20.jpg)
Sensitivity to Density
ALCHEM alchem.usc.edu
• Density↑ -> Speedup & Energy Efficiency↑• Achieving greater parallelism
CEI cei.pratt.duke.edu
![Page 21: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/21.jpg)
Conclusion• We propose GraphR:
• A graph processing accelerator based on ReRAM
• Key Insights/Results:
• ReRAM based SpMV for processing in graph engine
• Stream-apply execution
• Parallel MAC and Add-Op patterns
• 16.01x performance gain and 33.82 in energy efficiency
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
![Page 22: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load](https://reader034.vdocuments.mx/reader034/viewer/2022051408/6003da8fc13a8819f8274dd9/html5/thumbnails/22.jpg)
GraphR: Accelerating Graph Processing Using ReRAM
Linghao Song*, Youwei Zhuo#, Xuehai Qian#, Hai Li*, Yiran Chen*
*Duke University#University of Southern California
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu