enabling high performance computational dynamics in a...
TRANSCRIPT
![Page 1: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/1.jpg)
Enabling High Performance Computational Dynamics
in a Heterogeneous Hardware Ecosystem
Assistant Prof. Dan Negrut
Simulation-Based Engineering Lab
Department of Mechanical Engineering
University of Wisconsin – Madison
Toward High-Performance Computing Support
for the Simulation and Planning of Robot Contact Tasks
June 27, 2011
![Page 2: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/2.jpg)
Acknowledgements
� People:
� Alessandro Tasora – University of Parma, Italy
� Mihai Anitescu – Argonne National Lab
� Lab Students:
� Hammad Mazhar
� Toby Heyn
� Andrew Seidl
� Justin Madsen
� Naresh Khude
� Makarand Datar
� Financial support� National Science Foundation, Young Investigator Career Award
� FunctionBay, S. Korea
� NVIDIA
� Microsoft
� Caterpillar
2
� Andrew Seidl
� Arman Pazouki
� Hamid Ansari
� Makarand Datar
� Dan Melanz
� Markus Schmid
![Page 3: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/3.jpg)
Talk Overview
� Overview of the engineering problems of interest
� Large-scale Multibody Dynamics� Problem formulation, solution method, and parallel implementation
� Overview of Heterogeneous Computing Template (HCT)
� Numerical Experiments
� Validation efforts
� Conclusions3
![Page 4: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/4.jpg)
Computational Multibody Dynamics
Simulation generated in ADAMS
4
![Page 5: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/5.jpg)
Multi-Physics…Fluid-Solid Interaction: Navier-Stokes + Newton-Euler.
5
![Page 6: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/6.jpg)
Computational Dynamics
6
![Page 7: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/7.jpg)
Rover Mobility on Granular Terrain
� Wheeled/tracked vehicle mobility on granular terrain
� Also interested in scooping and loading granular material
7
Simulation in Chrono::Engine
![Page 8: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/8.jpg)
Frictional Contact Simulation[Commercial Solution]
� Model Parameters:� Spheres: 60 mm diameter and mass 0.882 kg� Forces: smoothing with stiffness of 1E5, force
exponent of 2.2, damping coefficient of 10.0, and a penetration depth of 0.1
� Simulation length: 3 seconds
8
![Page 9: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/9.jpg)
Frictional Contact: Two Different Approaches
Considered
� Discrete Element Method (DEM) - draws on a “smoothing” (penalty) approach
� Lots of heuristics
� Slow
� General purpose
� Used in ADAMS� Used in ADAMS
� DVI-based (Differential Variational Inequalities)� A set of differential equations combined with inequality constraints
� Fast (stable for significantly larger integration step-sizes)
� Less general purpose
� Used widely in computer games
9
![Page 10: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/10.jpg)
The Modeling Component
10
![Page 11: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/11.jpg)
Equations of Motion: Multibody Dynamics
11
![Page 12: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/12.jpg)
Traditional Discretization Scheme
positionstime step index
speeds Reaction
impulsesApplied ForcesMass Mat.
Coulomb 3D fricion
model
Complementarity
Condition
Stabilization
term
(Stewart & Trinkle, 1996)12
![Page 13: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/13.jpg)
Relaxed Discretization Scheme Used
Relaxation Term
(Anitescu & Tasora, 2008)13
![Page 14: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/14.jpg)
� Introduce the convex hypercone...
The Cone Complementarity Problem
(CCP)
... and its polar hypercone:
� First order optimality conditions lead to Cone Complementarity Problem
CCP assumes following form: Find γ such that
14
![Page 15: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/15.jpg)
Putting Things in Perspective…
15
� Friction model posed as an optimization problem
� Working with velocity and impulses rather than acceleration and forces
� Contact complementarity expression altered to lead to CCP
� Three key points led to above algorithm:
![Page 16: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/16.jpg)
Implementation
� Method outlined implemented using two loops
� Outer loop – runs the time stepping
� Inner loop – CCP Algorithm (solves CCP problem at each time step)
16
![Page 17: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/17.jpg)
Granular Dynamics:How Parallel Computing is Leveraged
1. Parallel Collision Detection
2. (Body parallel) Force kernel
3. (Contact parallel) Contact preprocessing kernel
4. (Contact parallel) CCP contact kernel
Ou
ter L
oo
p
4. (Contact parallel) CCP contact kernel
5. (Constraint parallel) CCP constraint kernel
6. (Reduction-slot parallel) Velocity reduction kernel
7. (Body parallel) Body velocity update kernel
8. (Body parallel) Time integration kernel
Inn
er
Lo
op
17
Ou
ter L
oo
p
![Page 18: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/18.jpg)
Inner Loop (CCP Algorithm)
18
![Page 19: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/19.jpg)
Large Scale Granular Dynamics
� Numerical solution can leverage parallel computing
19
![Page 20: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/20.jpg)
800
1000
1200
Tesla 10-series
Tesla 20-series
CPU vs. GPU – Flop Rate(GFlop/Sec)
Single Precision
Double Precision
0
200
400
600
800
2003 2004 2005 2006 2007 2008 2009 2010
Tesla 8-series
Nehalem3 GHz
Westmere3 GHz
Tesla 20-series
Tesla 10-series
20
![Page 21: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/21.jpg)
CPU vs. GPU– Memory Bandwidth[GB/sec]
100
120
140
160
Tesla 10-series
Tesla 20-series
21
0
20
40
60
80
100
2003 2004 2005 2006 2007 2008 2009 2010
Tesla 8-series
Nehalem 3 GHz
Westmere3 GHz
![Page 22: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/22.jpg)
Mixing 40,000 Spheres on the GPU
22
![Page 23: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/23.jpg)
300K Spheres in Tank[parallel on the GPU]
23
![Page 24: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/24.jpg)
1.1 Million Rigid Spheres[parallel on the GPU]
24
![Page 25: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/25.jpg)
A Heterogeneous Computing Templatefor for
Computational Dynamics
25
![Page 26: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/26.jpg)
Heterogeneous Cluster
26
Second fastest cluster at University of Wisconsin-Madison
![Page 27: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/27.jpg)
Computation Using Multiple CPUs[DEM solution]
27
![Page 28: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/28.jpg)
Computation Using Multiple CPUs[DEM solution]
28
![Page 29: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/29.jpg)
Computation Using Multiple CPUs[DEM solution]
29
![Page 30: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/30.jpg)
True Heterogeneous Computing
� Use 1 GPU per MPI process
� Each process uses its GPU to perform the collision detection � Each process uses its GPU to perform the collision detection
task during each time step
� As before, CPU takes care of communication between sub-domains
� State data is copied to GPU, CD is performed, and collision data is copied back
� GPU is used as an accelerator/co-processor
30
![Page 31: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/31.jpg)
Demonstration
16,000 bodies
2 sub-domains
CPU:
CPU+GPU: 9.43 hrs
31
![Page 32: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/32.jpg)
Heterogeneous Computing TemplateFive Major Components
� Computational Dynamics requires
� Domain decomposition
� Proximity computation
� Inter-domain data exchange� Inter-domain data exchange
� Numerical algorithm support
� Post-processing (visualization)
� HCT represents the library support and associated API that capture this five component abstraction
32
![Page 33: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/33.jpg)
Typical Simulation Results…
� LEFT: Infinity norm of the residual vs. iteration index in the CCP solution� Convergence rate (slope of curve) becomes smaller as the iteration index increases.
33
� RIGHT: Infinity norm of the CCP residual after rmax iterations as function of
granular material depth (number of spheres stacked on each other).
![Page 34: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/34.jpg)
Searching for Better Methods
� Frictionless case (bound constraints in place)
� Gauss-Jacobi (CE)
� Projected conjugate gradient (ProjCG)
� Gradient projected conjugate gradient (GPCG)� Gradient projected conjugate gradient (GPCG)
� Gradient projected MINRES (GPMINRES)
� Friction case (cone constraints - ongoing)
� Newton’s Method for large bound-constrained problems� Uses re-parameterization to handle friction cones (replace with bound constraints)
34
![Page 35: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/35.jpg)
Numerical Experiments
� Test Problem: 40,000 bodies ⇒ 157,520 contacts
� Frictionless
35
![Page 36: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/36.jpg)
Test Problem (MATLAB)
Method Iterations
Final Residual
Norm
γmin γmax Time [sec]
CE 1000 6.11 x 10-2 0.0 2.0598 1849.5
ProjCG 1002 5.6344 x 10-4 0.0 2.2286 1235.6
GPCG 1600 1.0675 x 10-4 0.0 2.6349 382.3644
GPMinres 1100 9.5239 x 10-5 0.0 2.3090 238.0744
PCG 1000 2.4053 x 10-4 -1.1116 2.5254 27.9686
GMRES 1000 4.5315 x 10-5 -1.1635 2.5227 736.3007
MINRES 1000 1.6979 x 10-5 -1.1316 2.5253 41.5790
36
![Page 37: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/37.jpg)
Proximity ComputationProximity Computation
37
![Page 38: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/38.jpg)
GPU Collision Detection (CD)
� 30,000 feet perspective:
� Carry out spatial partitioning of the volume occupied by the bodies� Carry out spatial partitioning of the volume occupied by the bodies� Place bodies in bins (cubes, for instance)
� Follow up by brute force search for all bodies touching each bin� Embarrassingly parallel
38
![Page 39: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/39.jpg)
� Example: 2D collision detection, bins are squares
Basic Idea: Search for Contacts in Different Bins
in Parallel
39
![Page 40: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/40.jpg)
Ellipsoid-Ellipsoid CD: Results
30
35
40
45
Tim
e (
seco
nd
s)
Time vs. Number of Contacts
120
140
160
180
Speedup - GPU vs. CPU
40
0
5
10
15
20
25
30
0 500,000 1,000,000 1,500,000
Tim
e (
seco
nd
s)
Number of Contacts
0
20
40
60
80
100
120
0 500,000 1,000,000 1,500,000
Sp
eed
up
Number of Contacts
![Page 41: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/41.jpg)
Speedup - GPU vs. CPU (Bullet library)[results reported are for spheres]
140
160
180
200
GPU: NVIDIA Tesla C1060CPU: AMD Phenom II Black X4 940 (3.0 GHz)
41
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
X S
peed
up
Contacts (Millions)
![Page 42: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/42.jpg)
2.5
3
3.5
4
Tim
e (
sec)
Parallel Implementation:
Number of Contacts vs. Detection Time
[results reported are for spheres]
0
0.5
1
1.5
2
0 2 4 6 8 10 12 14 16 18 20 22 24
Tim
e (
sec)
Contacts (Millions)42
![Page 43: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/43.jpg)
Multiple-GPU Collision Detection
Assembled Quad GPU Machine
Processor: AMD Phenom II X4 940 Black
Memory: 16GB DDR2
Graphics: 4x NVIDIA Tesla C1060
Power supply 1: 1000W
Power supply 2: 750W
43
![Page 44: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/44.jpg)
SW/HW Setup
Main Data Set
Results16 GB RAM
Thread
0
Thread
1
Thread
3
Thread
2
GPU
0
GPU
1
GPU
3
GPU
2
Open MP Quad Core AMD
Microprocessor
Tesla C1060
4x4 GB Memory
4x30720 threadsCUDA
44
![Page 45: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/45.jpg)
Results – Contacts vs. Time
140
160
180
200
Tim
e (
Sec)
Quad Tesla C1060 Configuration
45
0
20
40
60
80
100
120
0 1 2 3 4 5 6
Tim
e (
Sec)
Contacts (Billions)
![Page 46: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/46.jpg)
Conclusions
� Work aimed at enabling high-fidelity discrete models using a physics-based approach
� Approach draws on unique CPU + GPU parallel approach, which � Approach draws on unique CPU + GPU parallel approach, which leverages a Heterogeneous Computing Template
� Accomplishments to date� Billion body parallel collision detection
� Parallel solution of cone complementarity problem, about 12 million unknowns
� Early validation results encouraging
� Aiming at billion bodies simulations
46
![Page 47: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/47.jpg)
Ongoing/Future Work
� Massively parallel linear algebra for solution of CCP problem
More general collision detection code� More general collision detection code
� Multiphysics:
� Fluid-solid interaction
� Electrostatics
47
![Page 48: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/48.jpg)
Thank You.Thank You.
48
![Page 49: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/49.jpg)
49
![Page 50: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/50.jpg)
50
![Page 51: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/51.jpg)
Validation.Validation.
51
Simulation is doomed to succeed.(Rod Brooks, roboticist)
![Page 52: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/52.jpg)
Model
Simulate
Validate
� Validation at “microscale” – University of Wisconsin-Madison� Work in progress
� Validation at “macroscale” – University of Parma, Italy
52
![Page 53: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/53.jpg)
Flat Hopper Tests
Video recording from a test (a case that starts from high crystallization)
53
![Page 54: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/54.jpg)
Flat Hopper Tests
3D rendering from a simulation (4x slower than real-time) 54
![Page 55: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/55.jpg)
Flat Hopper Tests
� Comparison experimental - simulated
Experimental Simulated
55
![Page 56: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/56.jpg)
Validation atMicroscale
� Sand flow rate measurements
� Approx. 40K bodies
� Glass beads
� Diameter: 100-500 microns
56
![Page 57: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/57.jpg)
Experimental Setup
Disruptor beads
CPU connection
Nanopositioner controller
Translational stage
Load cell
Nanopositioner
57
![Page 58: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/58.jpg)
Flow Measurement, 500 micron Spheres
58
![Page 59: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/59.jpg)
Flow Simulation, 500 micron Spheres
59
![Page 60: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/60.jpg)
Flow Measurement Results, 3mm Gap Size
Weig
ht
[N]
60
Weig
ht
[N]
Time [sec]
![Page 61: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/61.jpg)
Flow Measurement Results, 2.5mm Gap Size
Weig
ht
[N]
61
Weig
ht
[N]
Time [sec]
![Page 62: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/62.jpg)
Flow Measurement Results, 2mm Gap Size
Weig
ht
[N]
62
Weig
ht
[N]
Time [sec]
![Page 63: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/63.jpg)
Flow Measurement Results, 1.5mm Gap Size
63
Weig
ht
[N]
Time [sec]
![Page 64: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/64.jpg)
Validation Experiment:Repose Angle
� Simulation� Experiment
64
φ = 19.5◦ for µ = 0.39
![Page 65: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/65.jpg)
Validation ExperimentFlow and Stagnation
65
![Page 66: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/66.jpg)
Validation,Flow and Stagnation
66
![Page 67: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/67.jpg)
Validation,Flow and Stagnation
67
![Page 68: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/68.jpg)
68
![Page 69: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/69.jpg)
69
![Page 70: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/70.jpg)
Spherical Decomposition
� Represent complex geometry as a union of spheres
� Fast parallel collision detection on GPU
� Allows non-convex geometry
� NOTE: Used only for CD, the dynamics is performed using original geometry
70
![Page 71: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/71.jpg)
Examples…
� Chain model
� 10 links
� 7,797 spheres per link
� Plow model
� 31,791 spheres in plow
blade model
� 15,000 spheres
representing terrain
71
71
![Page 72: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/72.jpg)
Numerical Results: Pebble Bed Nuclear Reactor
� Two types of tests were run
� On the GPU
� CD: NVIDIA 8800 GT
� CCP: NVIDIA Tesla C870
� On the CPU
� Single threaded
� Quad Core Intel Xeon E5430
� 2.66 GHz
� The reactor contains spheres which flow out the bottom the nozzle and are recycled to the top of the reactor
� Performed simulations with 16K, 32K, 64K, and 128K bodies72
![Page 73: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/73.jpg)
73
![Page 74: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/74.jpg)
Results: Average Duration for Taking One Integration Step [∆∆∆∆t=0.01 s]
y = 0.0007x - 6.3447R² = 0.996
70
80
90
100
Pebble Bed Reactor Demo, GPU vs. CPU
GPU Average Total Time CPU average Total Time Linear (GPU Average Total Time) Linear (CPU average Total Time)
y = 6E-05x - 0.51R² = 0.9936
0
10
20
30
40
50
60
70
0 16000 32000 48000 64000 80000 96000 112000 128000
Av
era
ge T
ota
l T
ime
# of spheres in pebble bed reactor74
![Page 75: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/75.jpg)
Algorithmic Challenges: Gauss-Jacobi Doesn’t Scale Well
� Convergence stalls for certain classes of problems
� Very large problems
� Problems where I have many body stacked on each other
� Inherent problem with Gauss-Jacobi, pertains the propagation of � Inherent problem with Gauss-Jacobi, pertains the propagation of
information
� Granular dynamics problem have an intrinsic degree of
redundancy
75
![Page 76: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/76.jpg)
Ellipsoid-Ellipsoid CD: Visualization
76
![Page 77: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/77.jpg)
Example: Ellipsoid-Ellipsoid CD
1n
1P 2
ε
1 2 1 2 1 2
1 2
1 1( ) ( )2 2λ λ
= = + +d P - P M M c b - b
2 22
1 2 1 2,i i i i j i j i j
α α α α α α α α α
∂ ∂ ∂ ∂∂ ∂= − = −
∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂
P P P Pd d
Rotation MatrixA :
77
1α
2α
c
1ε
2n
2P
y
x
z
3
1 1( )2 8
T
i iα λ αλ
∂ ∂= −
∂ ∂
P cM Mcc M
2 TM = AR A
1 2 3( , , )diag r r r=R
Translation of
ellipsoids center
b :
2 1
4
Tλ = n Mn
2
3 5
3
2
3
1 3( )
8 32
1[ ) ) ]
8
1 1( )2 8
T T
i j j i
T T
i i j
T
i j
α α α αλ λ
α α αλ
λ α αλ
∂ ∂ ∂= − +
∂ ∂ ∂ ∂
∂ ∂ ∂−
∂ ∂ ∂
∂+ −
∂ ∂
P c cM Mcc M c M
c c c(c M M + Mc( M
cM Mcc M
![Page 78: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/78.jpg)
Going Back to Practical Problem of Interest
� Tracked Vehicle on Granular Terrain…
78
![Page 79: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/79.jpg)
Spherical Decomposition Steps
2: Cubit
3: Spherical Padding
1: Take shoe element
3: Spherical Padding
79
![Page 80: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/80.jpg)
Track Components1,594,908 spheres per track
80
![Page 81: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/81.jpg)
Granular Terrain Model
� Represent terrain as collection of discrete particles
� Match terrain surface profile
� Capture changing granularity with depth
81
![Page 82: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/82.jpg)
Track Simulation 1
Parameters:• Driving speed: 1.0 rad/sec
• Length: 12 seconds
• Time step: 0.005 sec
• Computation time: 18.5 hours
• Particle radius: .027273 m
• Terrain: 284,715 particles
82
![Page 83: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/83.jpg)
Track Simulation 2
Parameters:• Driving speed: 1.0 rad/sec
• Length: 10 seconds
• Time step: 0.005 sec
• Computation time: 17.8 hours
• Particle radius: .025±.0025 m
83
• Terrain: 467,100 particles
![Page 84: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/84.jpg)
Results: Track ‘Footprint’
84
![Page 85: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous](https://reader033.vdocuments.mx/reader033/viewer/2022060208/5f0417da7e708231d40c4926/html5/thumbnails/85.jpg)
Results: PositionsTrack Simulation 1
85