k nowledge d iscovery t oolbox
DESCRIPTION
Adam Lugowski. kdt.sourceforge.net. K nowledge D iscovery T oolbox. Our users: Domain Experts. 2. 1. 4. Build input graph. 3. Cull relevant data. Interpret results. Analyze graph. KDT. Data filtering technologies. Graph viz engine. Example workflow. How to target Domain Experts?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/2.jpg)
Our users: Domain Experts
KDTData
filteringtechnologies
Build input graph Analyze
graph
Cull relevant
data
Interpretresults
Graphviz
engine
32
1 4
![Page 3: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/3.jpg)
Example workflow
![Page 4: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/4.jpg)
How to target Domain Experts?
• Conceptually simple
• Customizable
• High Performance
![Page 5: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/5.jpg)
centrality(‘approxBC’)pageRank
. . .
cluster(‘Markov’)contract
Complex methods
•SpMV, SpMV_SemiRing•SpGEMM, SpGEMM_SemiRing
Sparse-matrix classes/methods(e.g., Apply, EWiseApply, Reduce)
Underlying infrastructure (Combinatorial BLAS)
Building blocks
DiGraph VecMat•bfsTree,neighbor•degree,subgraph• load,UFget•+, -, sum, scale
•SpMV•SpGEMM• load, eye•reduce, scale•+, []
•max, norm,sort•abs, any, ceil •range, ones•+,-,*,/,>,==,&,[]
Domain Experts
Algorithm Experts
HPC Experts
![Page 6: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/6.jpg)
Why (sparse) adjacency matrices?
Traditional graph computations
Graphs in the language of linear algebra
Data driven,unpredictable communication
Fixed communication patterns
Irregular and unstructured,poor locality of reference
Operations on matrix blocks exploit memory hierarchy
Fine grained data accesses,dominated by latency
Coarse grained parallelism,bandwidth limited
![Page 7: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/7.jpg)
Example workflow
![Page 8: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/8.jpg)
1. LargestComponent
# the variable bigG contains the input graph# find and select the giant componentcomp = bigG.connComp()giantComp = comp.hist().argmax()G = bigG.subgraph(comp==giantComp)
![Page 9: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/9.jpg)
2. MarkovClustering # cluster the graph
clus = G.cluster(’Markov’)
![Page 10: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/10.jpg)
3. Graph of Clusters # contract the clusters
smallG = G.contract(clus)
![Page 11: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/11.jpg)
# the variable bigG contains the input graph# find and select the giant componentcomp = bigG.connComp()giantComp = comp.hist().argmax()G = bigG.subgraph(comp==giantComp)
# cluster the graphclus = G.cluster(’Markov’)
# contract the clusterssmallG = G.contract(clus)
Example workflow KDT code
![Page 12: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/12.jpg)
BFS on a Scale 29 RMAT graph(500M vertices, 8B edges)
Machine: NERSC’s Hopper
![Page 13: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/13.jpg)
Breadth-First Search
1
1
1 1 1
1 1
1 1
1 1
1
G1 2
3
47
6
5
![Page 14: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/14.jpg)
1
1
1 1 1
1 1
1 1
1 1
1
G1 2
3
47
6
5
7
fin
distance 1 from vertex 7
Breadth-First Search
![Page 15: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/15.jpg)
Breadth-First Search
1
1
1 1 1
1 1
1 1
1 1
1
G1 2
3
47
6
5
7
fin
× =7
7
7
fout
distance 1 from vertex 7
![Page 16: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/16.jpg)
Breadth-First Search
1
1
1 1 1
1 1
1 1
1 1
1
G1 2
3
47
6
5
3
4
5
fin
distance 2 from vertex 7
![Page 17: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/17.jpg)
Breadth-First Search
1
1
1 1 1
1 1
1 1
1 1
1
G1 2
3
47
6
5
3
4
5
fin
× =
4
4
5
fout
distance 2 from vertex 7
![Page 18: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/18.jpg)
# initializationparents = Vec(self.nvert(), -1, sparse=False)frontier = Vec(self.nvert(), sparse=True)parents[root] = rootfrontier[root] = root # 1st frontier is just the root# the semiring mult and add ops simply return the 2nd argsemiring = sr((lambda x,y: y), (lambda x,y: y))
# loop over frontierswhile frontier.nnn() > 0: frontier.spRange() # frontier[i] = i self.e.SpMV(frontier, semiring=semiring, inPlace=True) # remove already discovered vertices from the frontier. frontier.eWiseApply(parents, op=(lambda f,p: f), doOp=(lambda f,p: p == -1), inPlace=True) # update the parents parents[frontier] = frontier
KDT BFS routine
![Page 19: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/19.jpg)
BFS comparison with PBGL
Performance comparison of KDT and PBGL breadth-first search. The reported numbers are in MegaTEPS, or 106 traversed edges per second. The graphs are Graph500 RMAT graphs with 2scale vertices and 16*2scale edges.
Core Count(Machine) Code
Problem Size
Scale 19 Scale 22 Scale 24
4(Neumann)
PBGL 3.8 2.5 2.1
KDT 8.9 7.2 6.4
16(Neumann)
PBGL 8.9 6.3 5.9
KDT 33.8 27.8 25.1
128(Carver)
PBGL 25.9 39.4
KDT 237.5 262.0
256(Carver)
PBGL 22.4 37.5
KDT 327.6 473.4
![Page 20: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/20.jpg)
Connectivity only.
Plain graph
![Page 21: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/21.jpg)
(T, F, 0)
(F, T, 1)(T, F, 3)
(T, F, 2)
(T, T, 3)
(T, T, 1)
(F, T, 1)
(F, T, 4)
(T, T, 5)
(T, F, 0)
(T, F, 2)(F, F, 0)
Edge Attributes (semantic graph)
class edge_attr: isText isPhoneCall weight
![Page 22: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/22.jpg)
(F, T, 1)(T, F, 3)
(T, F, 2)
(T, T, 3)
(T, T, 1)
(F, T, 1)
(F, T, 4)
(T, T, 5)
(T, F, 2)
Edge Attribute Filter
class edge_attr: isText isPhoneCall weight
G.addEFilter(lambda e: e.weight > 0)
![Page 23: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/23.jpg)
Edge Attribute Filter Stack
class edge_attr: isText isPhoneCall weight
(F, T, 1)
(T, T, 3)
(T, T, 1)
(F, T, 1)
(F, T, 4)
(T, T, 5)
G.addEFilter(lambda e: e.weight > 0)G.addEFilter(lambda e: e.isPhoneCall)
![Page 24: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/24.jpg)
Filter implementation details
• Filter defined as a unary predicate– operates on edge or vertex value– written in Python– predicates checked in order they were added
• Each KDT object maintains a stack of filter predicates– all operations respect filter• enables filter-ignorant algorithm design• enables algorithm designers to use filters
![Page 25: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/25.jpg)
Two filter modes
• On-The-Fly filters– predicate checked each time an operation touches
vertex or edge
• Materialized filters– make copy of graph which excludes filtered
elements• predicate checked only once for each element
![Page 26: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/26.jpg)
Performance of On-The-Fly filtervs. Materialized filter
• For restrictive filter– OTF can be cheaper since fewer edges are touched• corpus can be huge, but only traverse small pieces
• For non-restrictive filter– OTF Saves space (no need to keep two large copies)– OTF Makes each operation more computationally
expensive
![Page 27: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/27.jpg)
texts and phone calls
# draw graphdraw(G)
# Each edge has this attribute:class edge_attr: isText isPhoneCall weight
![Page 28: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/28.jpg)
Betweenness Centrality
bc = G.centrality(“approxBC”)# draw graph with node sizes# proportional to BC scoredraw(G, bc)
![Page 29: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/29.jpg)
Betweenness Centrality on texts
# BC only on text edgesG.addEFilter( lambda e: e.isText)bc = G.centrality(“approxBC”)# draw graph with node sizes# proportional to BC scoredraw(G, bc)
![Page 30: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/30.jpg)
Betweenness Centrality on calls
# BC only on phone call edgesG.addEFilter( lambda e: e.isPhoneCall)bc = G.centrality(“approxBC”)# draw graph with node sizes# proportional to BC scoredraw(G, bc)
![Page 31: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/31.jpg)
SEJITS
• Selective Embedded Just-In-Time Specialization1. Take Python code2. Translate it to equivalent C++ code3. Compile with GCC4. Call compiled version instead of Python version
The way to make Python fast is to not use Python.-- Me
![Page 32: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/32.jpg)
BFS with SEJITS
Time (in seconds) for a single BFS iteration on Scale 25 RMAT (33M vertices, 500M edges) with 10% of elements passing filter. Machine is NERSC’s Hopper.
![Page 33: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/33.jpg)
BFS with SEJITS
Time (in seconds) for a single BFS iteration on Scale 23 RMAT (8M vertices, 130M edges) with 10% of elements passing filter. Machine is Mirasol.
![Page 34: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/34.jpg)
Roofline
• A way to find what your bottleneck is• MEASURE and PLOT potential limiting factors
in your exact system and program– compute power– RAM stream speed– RAM random access speed– disk– etc
• Your Roofline is the minimum of your plots
![Page 35: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/35.jpg)
KDT + SEJITS Roofline
Good(limited by DRAM)
Bad(Compute limited)
![Page 36: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/36.jpg)
Is MapReduce any good for graphs?
The prospect of the entire graph traversing the cloud fabric for each MapReduce job is disturbing.
- Jonathan Cohen
![Page 37: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/37.jpg)
PageRank comparison with PegasusCore
CountTask
Count CodeProblem Size
Scale 19 Scale 21
- 4 Pegasus 2h 35m 10s 6h 06m 10s
4 - KDT 55s 7m 12s
- 16 Pegasus 33m 09s 4h 40m 08s
16 - KDT 13s 1m 34s
Performance comparison of KDT and Pegasus PageRank (ε = 10−7). The graphs are Graph500 RMAT graphs. The machine is Neumann, a 32-core shared memory machine with HDFS mounted in a ramdisk.
MapReduce-based
![Page 38: K nowledge D iscovery T oolbox](https://reader035.vdocuments.mx/reader035/viewer/2022062323/5681526b550346895dc09f17/html5/thumbnails/38.jpg)
A Scalability limit for matrix-matrix multiplication: sqrt(p)
Million Traversed Edges Per Second in Betweenness Centrality computation. BC algorithm is composed of multiple BFS searches batched together into matrices and using SpGEMM for traversals.