stan posey nvidia, santa clara, ca, usa; sposey@nvidia · ansys nexxim 15.0 q4-2013 + cuda 5 kepler...
TRANSCRIPT
![Page 2: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/2.jpg)
2
Introduction of GPUs in HPC Progress of CFD on GPUs Review of OpenFOAM on GPUs Discussion on WRF Developments
Agenda: GPU Progress and Directions for CAE
![Page 3: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/3.jpg)
3
146X
Medical Imaging
U of Utah
36X
Molecular Dynamics
U of Illinois, Urbana
18X
Video Transcoding
Elemental Tech
50X
Matlab Computing
AccelerEyes
100X
Astrophysics
RIKEN
149X
Financial Simulation
Oxford
47X
Linear Algebra
Universidad Jaime
20X
3D Ultrasound
Techniscan
130X
Quantum Chemistry
U of Illinois, Urbana
30X
Gene Sequencing
U of Maryland
Real Application Speedups
![Page 4: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/4.jpg)
4
146X
Medical Imaging
U of Utah
36X
Molecular Dynamics
U of Illinois, Urbana
18X
Video Transcoding
Elemental Tech
50X
Matlab Computing
AccelerEyes
100X
Astrophysics
RIKEN
149X
Financial Simulation
Oxford
47X
Linear Algebra
Universidad Jaime
20X
3D Ultrasound
Techniscan
130X
Quantum Chemistry
U of Illinois, Urbana
30X
Gene Sequencing
U of Maryland
Real Application Speedups
NOTE: Missing context often fault of NVIDIA and not the organizations referenced
Always Demand Context!
- Full application? Often kernel only without data transfer . . .
- What is the reference CPU? Often old and dusty x86 . . .
- How many CPU cores in the comparison?
Often 1 core . . . but who uses only 1 core nowadays?
![Page 5: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/5.jpg)
5
2832
933
517 517
0
1000
2000
3000
Dual Socket CPU Dual Socket CPU + Tesla C2075
AN
SY
S F
luent
AM
G S
olv
er
Tim
e (
Sec)
2 x Xeon X5650, Only 1 Core Used
1.8x
5.5x
Lower is
Better
2 x Xeon X5650, All 12 Cores Used
Example: ANSYS Fluent GPU Acceleration
Helix geometry
1.2M Tet cells
Unsteady, laminar
Coupled PBNS, DP
AMG F-cycle on CPU
AMG V-cycle on GPU
Helix Model
NOTE: All jobs
solver time only
Preview of ANSYS Fluent 14.5 Performance – by ANSYS, Aug 2012
![Page 6: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/6.jpg)
6
GPU Computing is Mainstream
Chinese
Academy of
Sciences
Edu/Research
Air Force
Research
Laboratory
Naval Research
Laboratory
Government Oil & Gas
Max
Planck
Institute
Mass General
Hospital
Life Sciences Finance Manufacturing
![Page 7: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/7.jpg)
7
®
GPUs Now as Common to Servers as CPUs
![Page 8: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/8.jpg)
8
Supercomputing Momentum With GPUs
0
10
20
30
40
50
60 # of GPU accelerated systems on Top500
Tesla GPUs Launched
First Double Precision GPU
Tesla Fermi 20-series Launched
2007 2008 2009 2010 2011 2012
52 Tesla Accelerated
Systems in June 2012 Top500 List
Kepler Launched
![Page 9: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/9.jpg)
9
ORNL TITAN: #1 on Top500 List of Supercomputers
18,688 Tesla K20X GPUs
27 Petaflops Peak, 17.59 Petaflops on Linpack
90% of Performance from GPUs
#3 on Green500
![Page 10: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/10.jpg)
10
TITAN at ORNL 20+ PetaFlops
18,688 NVIDIA Tesla K20x
NVIDIA GPUs Accelerate HPC at Any Scale
Same GPU Technology from
MAXIMUS Workstations to
TITAN—the Leader of the
Top 500 at Top500.org
MAXIMUS Workstation
![Page 11: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/11.jpg)
11
Over 20 GPU Applications on ORNL Titan
WL-LSMS Role of material disorder, statistics, and fluctuations in nanoscale materials and systems.
S3D How are going to efficiently burn next generation diesel/bio fuels?
CAM-SE Answer questions about specific climate change adaptation and mitigation scenarios; realistically represent features like precipitation patterns/statistics and tropical storms.
Denovo Unprecedented high-fidelity radiation transport calculations that can be used in a variety of nuclear energy and technology applications.
LAMMPS Biofuels: An atomistic model of cellulose (blue) surrounded by lignin molecules comprising a total of 3.3 million atoms. Water not shown.
NRDF Radiation transport – critical to astrophysics, laser fusion, combustion, atmospheric dynamics, and medical imaging.
![Page 12: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/12.jpg)
12
Germany Juelich
HLRS
Max Planck
TU Dresden
UK Cambridge
EPCC
Oxford
STFC
Japan Tokyo Tech
RIKEN
Tsukuba
Rest of Europe BSC, Spain
CINECA, Italy
CEA, France
CSCS, Switzerland
China NSC, Shenzhen
NSC, Tianjin
CAS IPE
Rest of World MSU, Russia
RAS, Russia
IITs, India
United States Lawrence Livermore National Labs
Oak Ridge National Labs
Sandia National Labs
NOAA
NCSA BlueWaters
Leadership HPC Sites Now GPU Accelerated
![Page 13: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/13.jpg)
13
Tsubame 2.0 Tokyo Institute of Technology
TiTech Winner of 2011 Gordon Bell Prize Achieved with NVIDIA Tesla GPUs
“Peta-scale Phase-Field Simulation for Dendritic
Solidification on the TSUBAME 2.0 Supercomputer”
-- T. Shimokawabe, T. Aoki, et. al.
Special Achievement in Scalability and Time-to Solution
4,224 Tesla GPUs +
2,816 x86 CPUs
![Page 14: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/14.jpg)
14
World’s Most Energy Efficient Supercomputer
3150 MFLOPS/Watt
128 Tesla K20 Accelerators
$100k Energy Savings / Yr
300 Tons of CO2 Saved / Yr 0
1000
2000
3000
CINECA Eurora-Tesla K20
NICS Beacon-Greenest Xeon
Phi System
C-DAC- GreenestCPU System
MFLOPS/Watt
CINECA Eurora
“Liquid-Cooled” Eurotech Aurora Tigon
Greener than Xeon Phi, Xeon CPU
![Page 15: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/15.jpg)
15
Accelerated Computing Multi-core plus Many-cores
CPU Optimized for Serial Tasks
GPU Accelerator Optimized for Many
Parallel Tasks
10x Performance 5x Energy Efficiency
![Page 16: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/16.jpg)
16
Performance constrained by power
Impossible to optimize for both single
thread performance and power efficiency
the future is hybrid
Few cores optimized for serial work
Most cores optimized throughput
PCIe
Xeon Fast single threads
(serial work) GPU Extreme power-efficiency
(throughput work)
Intel Xeon Phi
Xeon
PCIe
Intel Agrees: Future HPC is Hybrid Computing
![Page 17: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/17.jpg)
17
Development of professional GPUs as co-processing accelerators for x86 CPUs GPUs provide a cost-effective and power-efficient approach to application speedups
Established industry alliances to develop HPC solutions Alliance with ISVs; Customers who develop HPC software; Research organizations
Technical collaborations in applications engineering Investment in PhD engineers who work with HPC software to optimize for GPUs
GPU integration with systems from major hardware vendors HP and several others; Kepler K20 based-systems available since 1Q 2013
NVIDIA HPC Technology and Strategy
Technology
Strategy
![Page 18: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/18.jpg)
18
Top Scientific Apps
Computational
Chemistry
AMBER
CHARMM
GROMACS
LAMMPS
NAMD
DL_POLY
Material Science QMCPACK
Quantum Espresso
GAMESS-US
Gaussian
NWChem
VASP
Climate &
Weather COSMO
GEOS-5
CAM-SE
NIM
WRF
Physics Chroma
Denovo
GTC
GTS
ENZO
MILC
CAE ANSYS Mechanical
MSC Nastran
SIMULIA Abaqus
ANSYS Fluent
OpenFOAM
LS-DYNA
Strong Growth of GPU Accelerated Applications
0
50
100
150
200
2010 2011 2012
# of Apps
40% Increase
61% Increase
Accelerated, In Development
![Page 19: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/19.jpg)
19
207 GPU-Accelerated Applications www.nvidia.com/appscatalog
![Page 20: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/20.jpg)
20
Developer Momentum Continues to Grow
2008 2013
4,000 Academic Papers
150K CUDA Downloads
60 University Courses
100M CUDA –Capable GPUs
1 Supercomputer
430M CUDA-Capable GPUs
50 Supercomputers
1.6M CUDA Downloads
640 University Courses
37,000 Academic Papers
![Page 21: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/21.jpg)
21
How GPU Acceleration is Developed
Application Code
+
GPU CPU 5% of Code
Compute-Intensive Functions
Rest of Sequential CPU Code
Hot Spot
50% -75% of
Profile time
![Page 22: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/22.jpg)
22
Applications
Libraries Programming
Languages OpenACC
Directives
“Drop-In”
Acceleration
GPU-acceleration in
Standard Language
(Fortran, C, C++)
Maximum
Flexibility
Less Portability
More Development
Programming Strategies for GPU Acceleration
![Page 23: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/23.jpg)
23
GPU Accelerated Libraries “Drop-in” Acceleration for your Applications
Linear Algebra FFT, BLAS,
SPARSE, Matrix
Numerical & Math RAND, Statistics
Data Struct. & AI Sort, Scan, Zero Sum
Visual Processing Image & Video
NVIDIA
cuFFT,
cuBLAS,
cuSPARSE
NVIDIA
Math Lib NVIDIA cuRAND
NVIDIA
NPP
NVIDIA
Video
Encode
GPU AI –
Board
Games
GPU AI –
Path Finding
![Page 24: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/24.jpg)
24
Software Domain Collaborators
LS-DYNA CAE LSTC, NVIDIA
Abaqus/Explicit CAE SIMULIA, NVIDIA
PAM-CRASH CAE ESI, CAPS
WRF Climate/NWP Cray, NVIDIA
COSMO Climate/NWP CSCS, NVIDIA
GEOS-5 Climate/NWP NASA GSFC, PGI
NIM Climate/NWP NOAA, PGI, CAPS, NVIDIA
S3D Combustion Cray, ORNL, Sandia NL, NVIDIA
Select Developments using Directives and OpenACC
www.openacc-standard.org
![Page 25: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/25.jpg)
25
ANSYS and NVIDIA Collaboration Roadmap
Release
ANSYS Mechanical ANSYS Fluent ANSYS EM
13.0 Dec 2010
SMP, Single GPU, Sparse
and PCG/JCG Solvers
ANSYS Nexxim
14.0 Dec 2011
+ Distributed ANSYS;
+ Multi-node Support
Radiation Heat Transfer
(beta)
ANSYS Nexxim
14.5 Nov 2012
+ Multi-GPU Support;
+ Hybrid PCG;
+ Kepler GPU Support
+ Radiation HT;
+ GPU AMG Solver (beta),
Single GPU
ANSYS Nexxim
15.0 Q4-2013
+ CUDA 5 Kepler Tuning + Multi-GPU AMG Solver;
+ CUDA 5 Kepler Tuning
ANSYS Nexxim
ANSYS HFSS (Transient)
![Page 26: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/26.jpg)
26
ANSYS Focus on Implicit Sparse Solvers
Application Software
+
GPU CPU - Hand-CUDA Parallel
- GPU Libraries, CUBLAS
- OpenACC Directives
Matrix Operations 50% - 75% of
Profile time,
Small % LoC
(Investigating OpenACC for more tasks on GPU)
Read input, matrix Set-up
Global solution, write output
Matrix Operations
![Page 27: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/27.jpg)
27
164
210
0
100
200
300
400
500
CPU + GPU
CPU OnlyHigher
is Better
ANSYS Mechanical 14.5 GPU Acceleration A
NSY
S M
echanic
al N
um
ber
of
Jobs
Per
Day
Xeon X5690 3.47 GHz 8 Cores + Tesla C2075
Xeon E5-2687W 3.10 GHz 8 Cores + Tesla K20
V14sp-5 Model
Turbine geometry
2,100,000 DOF
SOLID187 FEs
Static, nonlinear
One iteration (final
solution requires 25)
Distributed ANSYS 14.5
Direct sparse solver
Results from Supermicro
X9DR3-F, 64GB memory
Results for Distributed ANSYS 14.5 with 8-Core CPUs and single GPUs
Westmere Sandy Bridge
![Page 28: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/28.jpg)
28
164
210
341
395
0
100
200
300
400
500
CPU + GPU
CPU OnlyHigher
is Better
ANSYS Mechanical 14.5 GPU Acceleration A
NSY
S M
echanic
al N
um
ber
of
Jobs
Per
Day
Xeon X5690 3.47 GHz 8 Cores + Tesla C2075
Xeon E5-2687W 3.10 GHz 8 Cores + Tesla K20
V14sp-5 Model
Turbine geometry
2,100,000 DOF
SOLID187 FEs
Static, nonlinear
One iteration (final
solution requires 25)
Distributed ANSYS 14.5
Direct sparse solver
Results from Supermicro
X9DR3-F, 64GB memory
Results for Distributed ANSYS 14.5 with 8-Core CPUs and single GPUs
Westmere Sandy Bridge
K20 = 1.9x Acceleration
C2075 = 2.1x Acceleration
![Page 29: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/29.jpg)
29
NVIDIA Use of ANSYS Software in Product Engineering
ANSYS Icepak – active and passive cooling of IC packages
ANSYS Mechanical – large deflection bending of PCBs
ANSYS Mechanical – comfort and fit of 3D emitter glasses
ANSYS Mechanical – shock & vib of solder ball assemblies
![Page 30: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/30.jpg)
30
Select Automotive CAE Application ISV Select CAE Software GPU Status
CSM: Durability (Stress) and Fatigue MSC Nastran Available Today
Road Handling and VPG Adams (for MBD) Evaluation
Powertrain Stress Analysis Abaqus/Standard Available Today
Body NVH MSC Nastran Available Today
Crashworthiness and Safety LS-DYNA Implicit only, beta
CFD: Aerodynamics / Thermal UH ANSYS Fluent Available Today
IC Engine Combustion STAR-CCM+ Evaluation
Aerodynamics / HVAC OpenFOAM Available Today
Plastic Mold Injection Moldflow Available Today
GPU Developments for Automotive CAE
![Page 31: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/31.jpg)
31
GPU Developments for Turbine Engine CFD Developer Location Software (Green color indicates CUDA-ready during 2013)
Turbostream England, UK Turbostream 3.0
Oxford / Rolls Royce England, UK OP2 / Hydra
ANSYS USA ANSYS CFD 15.0 (Fluent + CFX)
ANSYS USA ANSYS Fluent 15.0
FluiDyna Germany Culises for OpenFOAM 2.2.0
Vratis Poland Speed-IT for OpenFOAM 2.2.0
Cascade Technologies USA CHARLES
Convergent Science USA Converge CFD
Sandia NL / Oak Ridge NL USA S3D
Naval Research Lab USA JENRE
Aviadvigatel OJSC Russia GHOST CFD
Turbomachinery
Combustor
Nozzle / Noise
![Page 32: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/32.jpg)
32
Dynamic
Parallelism
Kepler Fastest, Most Efficient HPC Architecture Ever
3x Performance per Watt SMX
Easy Speed-up for Legacy
MPI Apps Hyper-Q
Parallel Programming Made
Easier than Ever
![Page 33: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/33.jpg)
33
Tesla Kepler Family World’s Fastest and Most Efficient HPC Accelerators
GPUs
Single
Precision
Peak
(SGEMM)
Double
Precision
Peak
(DGEMM)
Memory
Size
Memory
Bandwidth
(ECC off)
System Solution
Weather & Climate,
Physics, BioChemistry, CAE,
Material Science
K20X 3.95 TF
(2.90 TF)
1.32 TF
(1.22 TF) 6 GB 250 GB/s Server only
K20 3.52 TF
(2.61 TF)
1.17 TF
(1.10 TF) 5 GB 208 GB/s
Server +
Workstation
Image, Signal,
Video, Seismic K10 4.58 TF 0.19 TF 8 GB 320 GB/s Server only
![Page 34: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/34.jpg)
34
2012 2014 2008 2010
DP G
FLO
PS p
er
Watt
Kepler
T10
Fermi
Maxwell
Volta Stacked DRAM
Unified Virtual Memory
Dynamic Parallelism
FP64
CUDA
32
16
8
4
2
1
0.5
NVIDIA CUDA GPU Roadmap
What to Expect from NVIDIA: • Increasing number of more flexible cores
• Larger and faster memories (6 GB today)
• Enhanced programming and standards
• Tighter integration with systems ~ 7x Fermi ~ 2x Kepler
~ 13x Fermi ~ 4x Kepler
![Page 35: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidia · ANSYS Nexxim 15.0 Q4-2013 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ... Maxwell Volta Stacked DRAM Unified Virtual Memory](https://reader033.vdocuments.mx/reader033/viewer/2022051509/5ad11df97f8b9ae2138e7ac8/html5/thumbnails/35.jpg)
35
VIRTUAL DESKTOPS
VIRTUAL MACHINE
NVIDIA Driver
NVIDIA GRID Enabled Virtual Desktop
NVIDIA GRID GPU
VDI
NVIDIA GRID ENABLED Hypervisor