![Page 1: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/1.jpg)
Galois Performance
Mario Mendez-LojoDonald Nguyen
![Page 2: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/2.jpg)
2
Overview
• Galois system is a test bed to explore opts– Safe but not fast out of the box
• Important optimizations– Select least transactional overhead– Select right scheduling– Select appropriate data structure
• Quantify optimizations on applications
![Page 3: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/3.jpg)
3
Algorithms
irregularalgorithms
topology
operator
ordering
morph
local computation
reader
general graph
grid
tree
unordered
ordered
1. Barnes-Hut
2. Delaunay Mesh Refinement
3. Preflow-push
![Page 4: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/4.jpg)
4
MethodologyTh
read
s
IdleSerial GC
Time
Compute
• Abort Ratio: Aborted It/Total it
• GC options• UseParallelGC• UseParallelOldGC• NewRatio=1
![Page 5: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/5.jpg)
5
Terms
• Base– Default scheduling, Default graph
• Serial– Galois classes => No concurrency control classes
• Speedup– Best mean performance of a serial variant
• Throughput– # Serial Iterations / time
![Page 6: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/6.jpg)
6
Numbers
• Runtime– Last of 5 runs in same VM– Ignore time to read and construct initial graph
• Other statistics– Last of 5 runs
![Page 7: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/7.jpg)
7
Test Environment
• 2 x Xeon X5570 (4 core, 2.93 GHz)• Java 1.6.0_0-b11• Linux 2.6.24-27 x86_64• 20GB heap size
![Page 8: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/8.jpg)
8
BARNES-HUT
Most Distant Galaxy Candidates in the Hubble Ultra Deep Field
![Page 9: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/9.jpg)
9
Barnes-Hut• N-body algorithm
– Oct-tree acceleration structure– Serial
• Tree build, center of mass, particle update
– Parallel• Force computation
• Structure– Reader on tree
• Variants– Splash2, Reader Galois
![Page 10: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/10.jpg)
10
Reader Optimization
child = octree.getNeighbor(nn, 1);
child = octree.getNeighbor(nn, 1, MethodFlag.NONE);
![Page 11: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/11.jpg)
11
ParaMeter Profile
![Page 12: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/12.jpg)
12
Barnes-Hut Results
100,000 points, 1 time step
Best serial: baseSerial time: 10271 msBest // time: 1553 msBest speedup: 6.6X
![Page 13: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/13.jpg)
13
Barnes-Hut Results
100,000 points, 1 time step
Best serial: baseSerial time: 10271 msBest // time: 1553 msBest speedup: 6.6X
![Page 14: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/14.jpg)
14
Barnes-Hut Scalability
![Page 15: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/15.jpg)
15
![Page 16: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/16.jpg)
16
DELAUNAY MESH REFINEMENT
![Page 17: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/17.jpg)
17
Delaunay Mesh Refinement
• Refine “bad” triangles– Maintained in worklist
• Structure– Cautious operator on graph
• Variants– Flag optimized, locallifo
base: Priority.defaultOrder()
local lifo: Priority.first(ChunkedFIFO.class). thenLocally(LIFO.class)
![Page 18: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/18.jpg)
Cautious Optimization
mesh.contains(item);...
mesh.remove(preNodes.get(i));...
mesh.add(node);
mesh.contains(item, MethodFlag.CHECK_CONFLICT);...
mesh.remove(preNodes.get(i), MethodFlag.NONE);...
mesh.add(node, MethodFlag.NONE);
• No need to save undo info• Only check conflicts up to first write
![Page 19: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/19.jpg)
19
LIFO Optimization
GaloisRuntime.foreach(...,
Priority.defaultOrder());
GaloisRuntime.foreach(...,
Priority.first(ChunkedFIFO.class).thenLocally(LIFO.class));
![Page 20: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/20.jpg)
20
ParaMeter Profile
![Page 21: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/21.jpg)
21
DMR Results
0.5M triangles, 0.25M bad triangles
Best serial: locallifo.flagoptSerial time: 17002 msBest // time: 3745 msBest speedup: 4.5X
![Page 22: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/22.jpg)
22
![Page 23: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/23.jpg)
23
PREFLOW-PUSH
![Page 24: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/24.jpg)
Preflow-push
• Max-flow algorithm– Nodes push flow downhill
• Structure– Cautious, local computation
• Variants– Flag optimized, local computation graph
base (discharge): Priority.first(Bucketed.class, numHeight+1, false, indexer). then(FIFO.class)
base (relabel): Priority.first(ChunkedFIFO.class, 8)
![Page 25: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/25.jpg)
25
Local Computation Optimization
graph = ...
graph = ...b = new LocalComputationGraph.ObjectGraphBuilder();
graph = b.from(graph).create()
![Page 26: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/26.jpg)
26
ParaMeter Profile
![Page 27: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/27.jpg)
27
Preflow-push Results
From challenge problem (genmf-wide)14 linearly connected grids(194x194), 526,904 nodes, 2,586,020 edgeshttp://avglab.com/andrew/CATS/maxflow_synthetic.htm
C: 11450 msJava: 30234 ms
Best serial: lc.flagoptSerial time: 57121 msBest // time: 18242 msBest speedup: 3.1X
![Page 28: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/28.jpg)
28
Preflow-push Scalability
![Page 29: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/29.jpg)
29
![Page 30: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/30.jpg)
30
What performance did we expect?Th
read
s
Time
IdleSerial GC//Compute Miss-Speculation
Measured Indirectly
Synchronization, …
Error
![Page 31: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/31.jpg)
31
What performance did we expect?
• Naïve: r(x) = t1 / x
• Amdahl: r(x) = tp / x + ts
t1 = tp + ts
ts = tidle + tgc+ tserial
• Simple: r(x) = (tp (ix / i1)) / x + ts
![Page 32: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/32.jpg)
32
Barnes-Hut
![Page 33: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/33.jpg)
33
Delaunay Mesh Refinement
![Page 34: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/34.jpg)
34
Preflow-push
![Page 35: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/35.jpg)
35
Summary
• Many profitable optimizations– Selecting among method flags, worklists, graph
variants
• Open topics– Automation– Static, dynamic and performance analysis– Efficient ordered algorithms
![Page 36: Galois Performance Mario Mendez-Lojo Donald Nguyen](https://reader036.vdocuments.mx/reader036/viewer/2022081514/56649d2f5503460f94a06dcd/html5/thumbnails/36.jpg)
36