sparse matrices for high performance graph analy8csgilbert/talks/gilbertpurduejanuary2016.pdfariful...
TRANSCRIPT
1
SparseMatricesforHighPerformanceGraphAnaly8cs
JohnR.GilbertUniversityofCalifornia,SantaBarbaraPurdueUniversityJanuary27,2016
Support at UCSB: Intel, Microsoft, DOE Office of Science, NSF
2
Thanks …
Ariful Azad, David Bader, Lucas Bang, Jon Berry, Eric Boman, Aydin Buluc, John Conroy, Kevin Deweese,
Erika Duriakova, Armando Fox, Joey Gonzalez, Shoaib Kamil, Jeremy Kepner, Tristan Konolige, Manoj Kumar, Adam Lugowski, Tim Mattson, Brad McRae, Henning Meyerhenke, Dave Mizell, Jose Moreira, Lenny Oliker,
Carey Priebe, Steve Reinhardt, Lijie Ren, Eric Robinson, Viral Shah, Veronika Strnadova-Neely, Yun Teng, Joshua Vogelstein, Drew Waranis, Sam Williams
3
Outline
• Mo8va8on:Graphapplica8ons
• Mathema8cs:Sparsematricesforgraphalgorithms
• SoLware:CombBLAS,KDT,QuadMat
• Standards:TheGraphBLASeffort
4
Computa8onalmodelsofthephysicalworld
Cortical bone"
Trabecular bone"
5
Largegraphsareeverywhere…
WWWsnapshot,courtesyY.Hyun Yeastproteininterac8onnetwork,courtesyH.Jeong
• Internetstructure• Socialinterac8ons
• Scien8ficdatasets:biological,chemical,cosmological,ecological,…
6
Co-authorgraphfrom1993
Householdersymposium
Socialnetworkanalysis(1993)
7
Facebookgraph:>1,200,000,000ver8ces
Socialnetworkanalysis(2016)
8
Computers
Con,nuousphysicalmodeling
Linearalgebra
Discretestructureanalysis
Graphtheory
Computers
Themiddlewarechallengeforgraphanalysis
9
Top 500 List (November 2015)
= x PA L U
Top500 Benchmark: Solve a large system of linear equations
by Gaussian elimination
10
Graph 500 List (November 2015)
Graph500 Benchmark:
Breadth-first search in a large
power-law graph
1 2
3
4 7
6
5
11
Floating-Point vs. Graphs, November 2015
= x P A L U1 2
3
4 7
6
5
34 Peta / 38 Tera is about 900.
34 Petaflops 38 Terateps
12
Nov 2015: 34 Peta / 38 Tera ~ 900 Nov 2010: 2.5 Peta / 6.6 Giga ~ 380,000
Floating-Point vs. Graphs, November 2015
= x P A L U1 2
3
4 7
6
5
34 Petaflops 38 Terateps
13
• Byanalogytonumericalscien8ficcompu8ng...
• WhatshouldthecombinatorialBLASlooklike?
Themiddlewarechallengeforgraphanalysis
C = A*B
y = A*x
µ = xT y
BasicLinearAlgebraSubrou,nes(BLAS):Ops/Secvs.MatrixSize
Thecaseforsparsematrices
Coarse-grained parallelism can be exploited by abstractions at the right level.
Vertex/edgegraphcomputa,ons
Graphsinthelanguageoflinearalgebra
Unpredictable,data-drivencommunica8onpaberns
Fixedcommunica8onpaberns
Irregulardataaccesses,withpoorlocality
Matrixblockopera8onsexploitmemoryhierarchy
Finegraineddataaccesses,dominatedbylatency
Coarsegrainedparallelism,limitedbybandwidthnotlatency
Sparsematrix-sparsematrixmul8plica8on
*
Sparsematrix-sparsevectormul8plica8on
*
.*
Sparsearrayprimi8vesforgraphs
Element-wiseopera8ons Sparsematrixindexing
Matricesovervarioussemirings:(+,×),(and,or),(min,+),…
16
Mul8ple-sourcebreadth-firstsearch
B
1 2
3
4 7
6
5
AT
17
Mul8ple-sourcebreadth-firstsearch
• Sparsearrayrepresenta8on=>spaceefficient
• Sparsematrix-matrixmul8plica8on=>workefficient
• Threepossiblelevelsofparallelism:searches,ver8ces,edges
BAT AT B
à
1 2
3
4 7
6
5
18
Examplesofsemiringsingraphalgorithms
(“values”:edge/vertexabributes,“add”:vertexdataaggrega8on,“mul8ply”:edgedataprocessing)
Generalschemaforuser-specifiedcomputa8onatver8cesandedges
Realfield:(R,+,*) Numericallinearalgebra
Booleanalgebra:({01},|,&)
Graphtraversal
Tropicalsemiring:(RU{∞},min,+)
Shortestpaths
(S,select,select)
Selectsubgraph,orcontractnodestoformquo8entgraph
Graphcontrac8onviasparsetripleproduct
5
6
3
1 2
4
A1
A3A2
A1
A2 A3
Contract
1 5 2 3 4 6 1
5
2 3 4
6
1 1 0 00 00 0 1 10 00 0 0 01 1
1 1 01 0 10 1 01 11 1
0 0 1
x x =
1 5 2 3 4 6 1 2 3
Subgraphextrac8onviasparsetripleproduct
5
6
3
1 2
4
Extract 3
12
1 5 2 3 4 6 1
5
2 3 4
6
1 1 1 00 00 0 1 11 00 0 0 01 1
1 1 01 0 11 1 01 11 1
0 0 1
x x =
1 5 2 3 4 6 1 2 3
21
Coun8ngtriangles(clusteringcoefficient)
A
5
6
3
1 2
4
Clusteringcoefficient:
• Pr(wedgei-j-kmakesatrianglewithedgei-k)
• 3*#triangles/#wedges
• 3*4/19=0.63inexample
• maywanttocomputeforeachvertexj
22
A
5
6
3
1 2
4
Clusteringcoefficient:
• Pr(wedgei-j-kmakesatrianglewithedgei-k)
• 3*#triangles/#wedges
• 3*4/19=0.63inexample
• maywanttocomputeforeachvertexj
“Cohen’s”algorithmtocounttriangles:
-Counttrianglesbylowest-degreevertex.
- Enumerate“low-hinged”wedges.
-Keepwedgesthatclose.
hi hilo
hi hilo
hihilo
Coun8ngtriangles(clusteringcoefficient)
23
A L U
1 2
1 1 1 2
C
A = L + U (hi->lo + lo->hi) L × U = B (wedge, low hinge)A ∧B = C (closed wedge)sum(C)/2 = 4 triangles
A
5
6
3
1 2
4 5
6
3
1 2
4
1
1
2
B, C
Coun8ngtriangles(clusteringcoefficient)
• Aimedatgraphalgorithmdesigners/programmerswhoarenotexpertinmappingalgorithmstoparallelhardware.
• Flexible,templatedC++interface.• Scalableperformancefromlaptopto100,000-processorHPC.
• OpensourcesoLware,version1.5.0releasedJanuary2016.
Anextensibledistributed-memorylibraryofferingasmallbutpowerfulsetoflinearalgebraicopera8ons
specificallytarge8nggraphanaly8cs.
CombinatorialBLAS
hbp://gauss.cs.ucsb.edu/~aydin/CombBLAS
CombinatorialBLASindistributedmemory
CommGrid
DCSC CSC CSBTriples
SpMatSpDistMatDenseDistMat
DistMat
Enforcesinterfaceonly
CombinatorialBLASfunc7onsandoperators
DenseDistVecSpDistVec
FullyDistVec...HASA
Polymorphism
Matrix/vectordistribu8ons,interleavedoneachother.
5
8
€
x1,1
€
x1,2
€
x1,3
€
x2,1
€
x2,2
€
x2,3
€
x3,1
€
x3,2
€
x3,3
€
A1,1
€
A1,2
€
A1,3
€
A2,1
€
A2,2
€
A2,3
€
A3,1
€
A3,2
€
A3,3
€
n pr€
n pc
2DLayoutforSparseMatrices&Vectors
-2Dmatrixlayoutwinsover1Dwithlargecorecountsandwithlimitedbandwidth/compute-2Dvectorlayoutsome8mesimportantforloadbalance
Defaultdistribu8oninCombinatorial BLAS.Scalablewithincreasingnumberofprocesses
Benchmarkinggraphanaly8csframeworks
CombinatorialBLASwasfastestamongalltested
graphprocessingframeworkson3outof4benchmarks
inanindependentstudybyIntel.
Sa8shetal."Naviga8ngtheMazeofGraphAnaly8csFrameworksusingMassiveGraphDatasets”,inSIGMOD’14
28
CombinatorialBLAS“users”(Jan2016)
• IBM (T.J.Watson, Zurich, & Tokyo) • Intel • Cray • Microsoft • Stanford • MIT • UC Berkeley • Carnegie-Mellon • Georgia Tech • Purdue • Ohio State • U Texas Austin • NC State • UC San Diego • UC Merced • UC Santa Barbara
• Berkeley Lab • Sandia Labs • Columbia • U Minnesota • Duke • Indiana U • Mississippi State • SEI • Paradigm4 • Mellanox • IHPC (Singapore)
• Tokyo Inst of Technology
• Chinese Academy of Sciences • U Canterbury (New Zealand)
• King Fahd U (Saudi Arabia) • Bilkent U (Turkey)
• U Ghent (Belgium)
• Aimedatdomainexpertswhoknowtheirproblemwellbutdon’tknowhowtoprogramasupercomputer
• Easy-to-usePythoninterface• Runsonalaptopaswellasaclusterwith10,000processors
• OpensourcesoLware(NewBSDlicense)
Ageneralgraphlibrarywithopera8onsbasedonlinear
algebraicprimi8ves
KnowledgeDiscoveryToolboxhbp://kdt.sourceforge.net/
Example:• Vertextypes:Person,Phone,
Camera,Gene,Pathway• Edgetypes:PhoneCall,TextMessage,
CoLoca8on,SequenceSimilarity• Edgeabributes:Time,Dura8on
• Calculatecentralityjustforemailsamongengineerssentbetweengivenstartandend8mes
Abributedseman8cgraphsandfilters
def onlyEngineers (self): return self.position == Engineer def timedEmail (self, sTime, eTime): return ((self.type == email) and (self.Time > sTime) and (self.Time < eTime)) G.addVFilter(onlyEngineers) G.addEFilter(timedEmail(start, end)) # rank via centrality based on recent email transactions among engineers bc = G.rank(’approxBC’)
KDT$Algorithm$
CombBLAS$Primi4ve$
Filter$(Py)$
Python'
C++'
Semiring$(Py)$KDT$Algorithm$
CombBLAS$Primi4ve$ Filter$(C++)$
Semiring$(C++)$
Standard$KDT$ KDT+SEJITS$
SEJITS$$$$Transla4on$
Filter$(Py)$
Semiring$(Py)$
SEJITSforfilter/semiringaccelera8on
EmbeddedDSL:Pythonforthewholeapplica8on• Introspect,translatePythontoequivalentC++code• Callcompiled/op8mizedC++insteadofPython
FilteredBFSwithSEJITS
!"#$%!"$!%&"!!%#"!!%'"!!%("!!%&)"!!%*#"!!%)'"!!%
&#&% #$)% $+)% &!#'% #!#$%
!"#$%&'
(%)*
"%
+,*-".%/0%!12%3./4"55"5%
,-.% /012./3,-.% 456789:/%
Time(inseconds)forasingleBFSitera8ononscale25RMAT(33Mver8ces,500Medges)with10%ofelementspassingfilter.MachineisNERSC’sHopper.
33
Afewothergraphalgorithmswe’veimplementedinlinearalgebraicstyle
• Maximalindependentset(KDT/SEJITS)[BDFGKLOW2013]
• Peer-pressureclustering(SPARQL)[DGLMR2013]
• Time-dependentshortestpaths(CombBLAS)[Ren2012]
• Gaussianbeliefpropaga8on(KDT)[LABGRTW2011]
• Markoffclustering(CombBLAS,KDT)[BG2011,LABGRTW2011]
• Betweennesscentrality(CombBLAS)[BG2011]
• Geometricmeshpar88oning(MatlabJ)[GMT1998]
34
The(original)BLAS
• ExpertsinmappingalgorithmstohardwaretuneBLASforspecificpla�orms.
• ExpertsinnumericallinearalgebrabuildsoLwareontopoftheBLAStogethighperformance“forfree.”
Todayeverycomputer,phone,etc.comeswith/usr/lib/libblas!
TheBasicLinearAlgebraSubrou8neshadarevolu8onaryimpact
oncomputa8onallinearalgebra.
BLAS1 vectorops Lawson,Hanson,Kincaid,Krogh,1979
LINPACK
BLAS2 matrix-vectorops
Dongarra,DuCroz,Hammarling,Hanson,1988
LINPACKonvectormachines
BLAS3 matrix-matrixops
Dongarra,DuCroz,Duff,Hammarling,1990
LAPACKoncachebasedmachines
Canwestandardizea“GraphBLAS”?
Canwestandardizea“GraphBLAS”?
No, it’s not reasonable to define a universal set
of building blocks.
• Huge diversity in matching graph algorithms to hardware platforms. • No consensus on data structures or linguistic primitives. • Lots of graph algorithms remain to be discovered. • Early standardization can inhibit innovation.
Canwestandardizea“GraphBLAS”?
Yes, it is reasonable to define a universal set
of building blocks…
… for graphs as linear algebra.
• Representing graphs in the language of linear algebra is a mature field. • Algorithms, high level interfaces, and implementations vary. • But the core primitives are well established.
TheGraphBLASeffort
• Manifesto, �
HPEC 2013:
• Workshops at IPDPS and HPEC – next in May 2016• Periodic working group telecons and meetings• Graph BLAS Forum: http://graphblas.org
Abstract-- It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive building blocks. This paper is a position paper defining the problem and announcing our intention to launch an open effort to define this standard.
SomeGraphBLASbasicfunc8ons(namesnotfinal)
Func,on(CombBLASequiv)
Parameters Returns Matlabnota,on
matmul(SpGEMM)
-sparsematricesAandB-op8onalunaryfuncts
sparsematrix C=A*B
matvec(SpM{Sp}V)
-sparsematrixA-sparse/densevectorx
sparse/densevector y=A*x
ewisemult,add,…(SpEWiseX)
-sparsematricesorvectors-binaryfunct,op8onalunarys
inplaceorsparsematrix/vector
C=A.*BC=A+B
reduce(Reduce)
-sparsematrixAandfunct densevector y=sum(A,op)
extract(SpRef)
-sparsematrixA-indexvectorspandq
sparsematrix B=A(p,q)
assign(SpAsgn)
-sparsematricesAandB-indexvectorspandq
none A(p,q)=B
buildMatrix(Sparse)
-listofedges/triples(i,j,v)
sparsematrix A=sparse(i,j,v,m,n)
getTuples(Find)
-sparsematrixA
edgelist [i,j,v]=find(A)
Matrix8mesmatrixoversemiring
Inputs matrix A: MxN (sparse or dense) matrix B: NxL (sparse or dense)Optional Inputs matrix C: MxL (sparse or dense) scalar “add” function ⊕ scalar “multiply” function ⊗ transpose flags for A, B, COutputs matrix C: MxL (sparse or dense)
Specific cases and function names:SpGEMM: sparse matrix times sparse matrixSpMSpV: sparse matrix times sparse vectorSpMV: sparse matrix times dense vectorSpMM: sparse matrix times dense matrix
Notesis the set of scalars, user-specifieddefaults to IEEE double float
⊕ defaults to floating-point +⊗ defaults to floating-point *
Implements C ⊕= A ⊕.⊗ B
for j = 1 : N C(i,k) = C(i,k) ⊕ (A(i,j) ⊗ B(j,k))
If input C is omitted, implements C = A ⊕.⊗ B
Transpose flags specify operation � on AT, BT, and/or CT instead
41
Breadth-firstsearchwith(draL)GraphBLAS
Conclusions
• Graphanalysispresentschallengesin:– Performance(bothserialandparallel)– SoLwarecomplexity– Interoperability
• Implemen8nggraphalgorithmsusingmatrix-basedapproachesprovidesseveralusefulsolu8onstothesechallenges.
• ResearchersfromIntel,IBM,Nvidia,LBL,MIT,UCSB,GaTech,KIT,etc.havejoinedtogethertocreatetheGraphBLASstandard.
• Severalimplementa8onsinprogress:– C++:CombBLAS(LBL,UCSB),GraphMAT(Intel)– C:GraphProgrammingInterface(IBM),S8nger(GaTech)– Java:Graphulo(MIT)– Python:NetworkKit(KIT)
43
Thank You!
Computers
Continuous physical modeling
Linear algebra & graph theory
Discrete structure analysis
http://graphblas.org